On the Utility of Inference Mechanisms

Similar documents
Charm: A charming network coordinate system

Vivaldi Practical, Distributed Internet Coordinates

Vivaldi: A Decentralized Network Coordinate System. Authors: Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris MIT. Published at SIGCOMM 04

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Vivaldi: A Decentralized Network Coordinate System

Vivaldi: A Decentralized Coordinate System

A Hierarchical Approach to Internet Distance Prediction

Building a low-latency, proximity-aware DHT-based P2P network

Vivaldi: : A Decentralized Network Coordinate System

Bridge-Node Selection and Loss Recovery in Island Multicast

A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks

Vivaldi Practical, Distributed Internet Coordinates

Seminar on Algorithms and Data Structures: Multiple-Edge-Fault-Tolerant Approximate Shortest-Path Trees [1]

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445

Experimental Study on Neighbor Selection Policy for Phoenix Network Coordinate System

The Frog-Boiling Attack: Limitations of Anomaly Detection for Secure Network Coordinate Systems

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

Server Replication in Multicast Networks

HyCube: A distributed hash table based on a hierarchical hypercube geometry

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks

Distributed Hash Table

Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Web Proxy Servers

A Survey on Research on the Application-Layer Traffic Optimization (ALTO) Problem

Spatial and Temporal Aware, Trajectory Mobility Profile Based Location Management for Mobile Computing

Network Delay Model for Overlay Network Application

Coloring 3-Colorable Graphs

Effective File Replication and Consistency Maintenance Mechanism in P2P Systems

Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29

Clustering Using Graph Connectivity

Exploiting Heterogeneity for Collective Data Downloading in Volunteer-based Networks

MISHIMA: Multilateration of Internet hosts hidden using malicious fast-flux agents (Short Paper)

Fundamental Effects of Clustering on the Euclidean Embedding of Internet Hosts

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Chunk Scheduling Strategies In Peer to Peer System-A Review

Week 12: Minimum Spanning trees and Shortest Paths

QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks

SVM Learning of IP Address Structure for Latency Prediction

Strongly Connected Spanning Subgraph for Almost Symmetric Networks

On the Placement of Web Server Replicas

NetForecast: A Delay Prediction Scheme for Provider Controlled Networks

Exploiting Communities for Enhancing Lookup Performance in Structured P2P Systems

Hybrid Overlay Structure Based on Random Walks

Differentiating Link State Advertizements to Optimize Control Overhead in Overlay Networks

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs

An Improved Exact Algorithm for Undirected Feedback Vertex Set

Message-Optimal Connected Dominating Sets in Mobile Ad Hoc Networks

Optimizing Overlay Topology by Reducing Cut Vertices

CONNECTIVITY AND NETWORKS

EE512 Graphical Models Fall 2009

Efficient Formation of Edge Cache Groups for Dynamic Content Delivery

An Enhanced Bloom Filter for Longest Prefix Matching

Coordinate-based Routing:

Distributed Active Measuring Link Bandwidth in IP Networks

A view from inside a distributed Internet coordinate system

Disjoint Sets. The obvious data structure for disjoint sets looks like this.

Simple Silhouettes for Complex Surfaces

Graph Algorithms: Chapters Part 1: Introductory graph concepts

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced algorithms. topological ordering, minimum spanning tree, Union-Find problem. Jiří Vyskočil, Radek Mařík 2012

Minimum spanning trees

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

Application Layer Multicast For Efficient Peer-to-Peer Applications

A Method of Identifying the P2P File Sharing

The performance of locality-aware topologies for peer-to-peer live streaming

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Chapter 9 Graph Algorithms

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results

We ve done. Introduction to the greedy method Activity selection problem How to prove that a greedy algorithm works Fractional Knapsack Huffman coding

Minerva: Learning to Infer Network Path Properties

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

On Benefits of Network Coding in Bidirected Networks and Hyper-networks

Evaluating Unstructured Peer-to-Peer Lookup Overlays

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Shaking Service Requests in Peer-to-Peer Video Systems

Kruskal s MST Algorithm

Exploiting Heterogeneity for Collective Data Downloading in Volunteer-based Networks

Traveling Salesperson Problem (TSP)

6 Distributed data management I Hashing

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks

arxiv: v1 [math.co] 28 Sep 2010

Approximation and Heuristic Algorithms for Minimum Delay Application-Layer Multicast Trees

Modeling Distances in Large-Scale Networks by Matrix Factorization

Supporting Network Coordinates on PlanetLab

On Static and Dynamic Partitioning Behavior of Large-Scale Networks

The Streaming Capacity of Sparsely Connected P2P Systems With Distributed Control

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

A Class of Submodular Functions for Document Summarization

Algorithmic Problems Related to Internet Graphs

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 2, APRIL Segment-Based Streaming Media Proxy: Modeling and Optimization

The Encoding Complexity of Network Coding

22 Elementary Graph Algorithms. There are two standard ways to represent a

MOST attention in the literature of network codes has

Towards Optimal Data Replication Across Data Centers

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

IP Multicast. What is multicast?

Transcription:

On the Utility of Inference Mechanisms Ethan Blanton Sonia Fahmy Greg N. Frederickson June 23, 2009

On the Utility of Inference Mechanisms We present an algorithm for determining when and if an inference mechanism can be used to reduce network measurement load on the network. Inference mechanisms reduce the network traffic used by active measurements, at the cost of accuracy. The overhead for inference mechanisms can be large. Our technique allows for inference to be used when it is beneficial, and more accurate direct measurements otherwise. We use spanning forests and estimation of edge connectivity to accomplish this.

Mixing Inference and Direct Measurement

Measurement Infrastructures We are looking at measurement infrastructures which perform on demand active measurements for end hosts. Measurement schedules are not known a priori. Measurement infrastructure providers want to minimize active measurement traffic. Measurement infrastructure users are end host applications.

Inference Mechanisms The literature is full of inference mechanisms. GNP 1, Vivaldi 2, IDMaps 3, NetQuest 4, etc. latency, jitter, available bandwidth, etc. 1 T. S. Eugene Ng and Hui Zhang. Towards Global Network Positioning. ACM SIGCOMM Workshop on Internet Measurement 2001. 2 Frank Dabek, Russ Cox, Frans Kaashoek, and Robert Morris. Vivaldi: A Decentralized Network Coordinate System. ACM SIGCOMM 2004. 3 Paul Francis, Sugih Jamin, Cheng Jin, Yixin Jin, Danny Raz, Yuval Shavitt, and Lixia Zhang. IDMaps: A Global Internet Host Distance Service. IEEE/ACM Transactions on Networking. Oct. 2001. 4 Han Hee Song, Lili Qiu, and Yin Zhang. NetQuest: a flexible framework for large-scale network measurement. ACM SIGMETRICS 2006.

Utilizing Inference Effectively Observation: Inference mechanisms lose fidelity over direct measurement. Observation: Due to the hidden constant, inference mechanisms can be more expensive than direct measurement. Observation: A measurement infrastructure sees measurement requests from disparate applications.

Utilizing Inference Effectively Observation: Inference mechanisms lose fidelity over direct measurement. Observation: Due to the hidden constant, inference mechanisms can be more expensive than direct measurement. Observation: A measurement infrastructure sees measurement requests from disparate applications. Conclusion: A measurement infrastructure is uniquely situated to determine when a collection of direct measurement requests can be replaced by inference to reduce measurement load on the network.

When to use inference Recall that many inference mechanisms use O(n) measurement probes to estimate the properties of O(n 2 ) paths. Assume that we can identify the hidden constant in that O(n). It can be large. We will call it k. Given a set of measurements from a measurement service and the constant k, we can now identify a tipping point at which the total number of measurements in the system is larger than an all-hosts inference.

When to use inference cont d Inference Non inference k = 3 No inference: 32 measurements All-hosts inference: 36 measurements With inference: 28 measurements

Asking the Right Question To determine if inference is beneficial, we have to answer a slightly different question. Does there exist a subset of hosts of size n, such that the number of measurements performed between those hosts is greater than k n? In our previous example, the answer is Yes, the gold-colored hosts. The task, then, is to identify all such subsets.

Our Solution Some facts: A k-regular graph is a graph in which each node has a degree of exactly k. This is precisely our tipping point. Finding a k-regular subgraph in a general graph is NP -complete.

Our Solution Some facts: A k-regular graph is a graph in which each node has a degree of exactly k. This is precisely our tipping point. Finding a k-regular subgraph in a general graph is NP -complete. Therefore, we seek a solution which trades off optimality for tractable computational complexity.

The Approach We use a set of minimum spanning forests to identify edges which are likely to be part of low edge-count cuts of the graph. Edges identified as likely candidates are pruned from the graph. The connected components which remain are assumed to be sets of hosts which would benefit from having their internal direct measurements replaced with an inference.

The Algorithm Represent hosts as vertices, and requested measurements as edges. Create a set of graphs identical to the unweighted input graph. Every edge in these new graphs is assigned a weight from a uniform random distribution. For each of these graphs, find a minimum spanning forest using the weights we just assigned. Remove any edges which appear in more of these forests than some threshold score from the input graph. Determine the remaining connected components, and return them.

Bounds on the expected score Random spanning forests give us a quick estimate of edge connectivity. For any edge, we can lower bound the probability that it occurs in a most constricting cut. Additionally, we can upper bound the probability that the actual score for an edge falls significantly below the lower bound on its expected score.

Bounds on the expected score cont d Let: e be an edge in the input graph. (V,E ) be the connected component of the input graph containing e. V 1 and V 2 be disjoint sets of vertices in V forming the two separated halves of a min cut of (V,E ). c(v 1,V 2 ) be the number of edges of the cut between V 1 and V 2 in G. Lemma 1 The probability that edge e is an edge in F i is at least 1/c(V 1,V 2 ), and the expected score for e is thus at least f/c(v 1,V 2 ). Furthermore, Pr(score(e) forests/c(v 1,V 2 ) α forests) exp( 2α 2 ) for any constant α.

Implications for Forests and the Threshold Lemma 1 shows us that the expected score of an edge is a function of both the number of spanning forests, which we choose, and c(v 1,V 2 ), which is dependent on the structure of the graph. The number of forests trades off complexity and accuracy. The bounds on our expected score indicate that the threshold score must be essentially proportional to the number of forests. Additionally, the threshold score must be inversely related to k, as higher values of k indicate higher cost of inference, thus requiring more aggressive pruning.

Experimental Evaluation We evaluate this algorithm on a number of graphs: Simple graphs of interesting topology. Large graphs having properties taken from the popular UUSee streaming television service in China.

Experimental Evaluation We evaluate this algorithm on a number of graphs: Simple graphs of interesting topology. Large graphs having properties taken from the popular UUSee streaming television service in China. The key measure of comparison is the number of measurements required to provide measurement results for all requested paths.

Simple Topology Inference Non inference k = 3, n = 17 Whole-graph inference: kn = 51 No inference: 58 measurements Optimal inference: 50 measureme

Simple Topology cont d This graph is trivial, because the two bridge edges appear in every spanning tree. Note that, for every run, there is a threshold score above which all threshold scores yield an optimal result. Things get trickier if we add more bridges between the fully connected subgraphs.

Simple Topology cont d This graph is trivial, because the two bridge edges appear in every spanning tree. Note that, for every run, there is a threshold score above which all threshold scores yield an optimal result. Things get trickier if we add more bridges between the fully connected subgraphs. Threshold score 50 40 30 20 10 Optimal score range forests/k = 17 0 2 4 6 8 10 12 14 16 Number of bridges; 50 forests, k = 3

Small World Topology Ten random small world graphs representing a single UUSee 5 channel. 2,500 vertices each. About 50,000 edges each. Clustering coefficient of about 2.5. Average path length of about 2.3. As before, our metric of interest is the number of total measurements required. Unfortunately, in this case we do not know the ideal solution, so we can compare only against full-graph inference and no inference. 5 Chuan Wu, Baochun Li and Shuqiao Zhao. Magellan: Charting Large-Scale Peer-to-Peer Live Streaming Topologies. ICDCS 2007.

Small World Topology cont d 7500 7450 7400 20 Forests 50 Forests 100 Forests 0.2 0.4 0.6 0.8 1 k = 3 Substantial savings are realized over the original 53,000 edge topologies. Savings over full-graph inference increase with k. Number of edges 12500 12300 12100 0.2 0.4 0.6 0.8 1 k = 5 Note the minimum at threshold score values just smaller than the number of forests over k (vertical line). 38000 34000 30000 0.2 0.4 0.6 0.8 1 Threshold score / Forests k = 15 Larger numbers of forests produce lower minimums and broader troughs.

Conclusions Inference mechanisms can induce a heavier load than sparse direct measurements. Given a set of hosts and measurements between them, we can efficiently identify subsets of hosts which would benefit from using an established inference algorithm rather than performing direct measurements. Savings on small-world graphs representative of popular Peer-to-peer applications communication patterns are measurable. More accurate direct measurements can be preserved where inference is not of sufficient benefit.

Thanks for listening. Questions?

Formalizing the question More precisely: We construct a graph, representing measurement hosts as vertices, and requested measurements as edges. Given such a graph G = (V,E), our goal is to transform it into another graph G = (V,E ) such that we minimize E, where 0 E E and 0 E k V.

Formalizing the question More precisely: We construct a graph, representing measurement hosts as vertices, and requested measurements as edges. Given such a graph G = (V,E), our goal is to transform it into another graph G = (V,E ) such that we minimize E, where 0 E E and 0 E k V. The only operation we can use in this minimization is replacing all edges among subsets V i of V by k V i edges. (That is, replace direct measurements with inference.)

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g) Inputs: G is the input graph. f is the number of minimum spanning forests to be created. s is the number of forests in which an edge must appear to be pruned. Outputs: Connected components for which internal measurements should be replaced by inference.

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g) Create a set of f graphs identical to the unweighted graph G...

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g)... except that in each graph every edge is assigned a weight from a uniform random distribution.

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g) For each of these graphs, find a minimum spanning forest using the weights we just assigned.

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g) Remove any edges which appear in more than s of these forests.

The Algorithm score(e) = f i=1 { 1 : e Fi 0 : e F i def inference groups(g, f, s) for i 1..f do let V (G i ) = V (G) let E(G i ) = E(G) for e E(G i ) do let wt(e) = random() let F i = MSF(G i ) for e E(G) do if score(e) > s then let E(G) = E(G)\e return components(g) Finally, determine the remaining connected components, and return them.

Complexity The complexity of a minimum spanning forest algorithm such as Kruskal s Algorithm is O( E log E ). This is too slow for on-line computation of incoming measurement requests in a large measurement infrastructure.

Complexity The complexity of a minimum spanning forest algorithm such as Kruskal s Algorithm is O( E log E ). This is too slow for on-line computation of incoming measurement requests in a large measurement infrastructure. However, there exist algorithms which compute updates to a minimum spanning forest in amortized O(log 4 ( V )) time per insertion or deletion, with a worst case of O( V 1/2 ). With these algorithms, we expect to compute on-line changes of very large graphs in reasonable time.

Topology 1 cont d 140 120 Min-Max Range Average 100 80 s 60 40 20 0 0 100 200 300 400 500 f f versus the value of s above which all results are optimal for Synthetic Topology 1, aggregate of ten runs.