On the placement of traceroute-like topology discovery instrumentation

Similar documents
Network Delay Model for Overlay Network Application

Complex networks: A mixture of power-law and Weibull distributions

Analysis and Comparison of Internet Topology Generators

DOLPHIN: THE MEASUREMENT SYSTEM FOR THE NEXT GENERATION INTERNET

AS Connectedness Based on Multiple Vantage Points and the Resulting Topologies

Relevance of Massively Distributed Explorations of the Internet Topology: Qualitative Results

Complex Network Metrology

Complex Network Metrology

BOSAM: A Tool for Visualizing Topological Structures Based on the Bitmap of Sorted Adjacent Matrix

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

The Beacon Number Problem in a Fully Distributed Topology Discovery Service

Robustness of Centrality Measures for Small-World Networks Containing Systematic Error

An Empirical Study of Routing Bias in Variable-Degree Networks

Deployment of an Algorithm for Large-Scale Topology Discovery

On Routing Table Growth

arxiv:cs/ v1 [cs.ni] 29 May 2006

A Radar for the Internet

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

The missing links in the BGP-based AS connectivity maps

UDP PING: a dedicated tool for improving measurements of the Internet topology

AS Router Connectedness Based on Multiple Vantage Points and the Resulting Topologies

Science foundation under Grant No , DARPA award N , TCS Inc., and DIMI matching fund DIM

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Deployment of an Algorithm for Large-Scale Topology Discovery

On the Utility of Inference Mechanisms

QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks

Sampling Biases in IP Topology Measurements

Design and Implementation of an Anycast Efficient QoS Routing on OSPFv3

Population size estimation and Internet link structure

Memory Placement in Network Compression: Line and Grid Topologies

On Pairwise Connectivity of Wireless Multihop Networks

An Internet router level topology automatically discovering system

On characterizing BGP routing table growth

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing

Clustering the Internet Topology at the AS-level

Non-zero disjoint cycles in highly connected group labelled graphs

Sampling Biases in IP Topology Measurements

Network Maps beyond Connectivity

Complementary Graph Coloring

On the Diversity, Stability and Symmetry of End-to-End Internet Routes

Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks

DESIGN AND ANALYSIS OF ALGORITHMS GREEDY METHOD

Characterizing and Modelling Clustering Features in AS-Level Internet Topology

Fundamental Properties of Graphs

Constructing Connected Dominating Sets with Bounded Diameters in Wireless Networks

An Efficient Algorithm for AS Path Inferring

A Topological Analysis of Monitor Placement

The Internet Structure

Chapter 9 Graph Algorithms

On Reshaping of Clustering Coefficients in Degreebased Topology Generators

Data gathering using mobile agents for reducing traffic in dense mobile wireless sensor networks

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

Building a low-latency, proximity-aware DHT-based P2P network

Behavioral Graph Analysis of Internet Applications

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

Chapter 9 Graph Algorithms

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Measuring the Degree Distribution of Routers in the Core Internet

Efficient Counting of Network Motifs

Chapter 9 Graph Algorithms

CMPSC 250 Analysis of Algorithms Spring 2018 Dr. Aravind Mohan Shortest Paths April 16, 2018

Measuring the Degree Distribution of Routers in the Core Internet

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS

Toward an Atlas of the Physical Internet

AN EFFICIENT MAC PROTOCOL FOR SUPPORTING QOS IN WIRELESS SENSOR NETWORKS

AN AD HOC network consists of a collection of mobile

Diagnosis of TCP Overlay Connection Failures using Bayesian Networks

The Buss Reduction for the k-weighted Vertex Cover Problem

Object replication strategies in content distribution networks

Internet Anycast: Performance, Problems and Potential

A New Splitting-Merging Paradigm for Distributed Localization in Wireless Sensor Networks

Graph Algorithms. Many problems in networks can be modeled as graph problems.

ScienceDirect. Analogy between immune system and sensor replacement using mobile robots on wireless sensor networks

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

A Reduction of Conway s Thrackle Conjecture

Survivability with P-Cycle in WDM Networks

Dynamic Routing Tables Using Simple Balanced. Search Trees

Internet-Scale Simulations of a Peer Selection Algorithm

Congestion Propagation among Routers in the Internet

In this chapter, we consider some of the interesting problems for which the greedy technique works, but we start with few simple examples.

An Algorithmic Approach to Graph Theory Neetu Rawat

SELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE

Constructing arbitrarily large graphs with a specified number of Hamiltonian cycles

IT IS increasingly the case that a given service request from a

Variable Length and Dynamic Addressing for Mobile Ad Hoc Networks

The Impact of Clustering on the Average Path Length in Wireless Sensor Networks

B. Krishna Sagar et al. / International Journal of Research in Modern Engineering and Emerging Technology

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching

DYNAMIC SEARCH TECHNIQUE USED FOR IMPROVING PASSIVE SOURCE ROUTING PROTOCOL IN MANET

Thwarting Traceback Attack on Freenet

Incentives for BGP Guided IP-Level Topology Discovery

DIAL: A Distributed Adaptive-Learning Routing Method in VDTNs

Modeling and Analysis of Random Walk Search Algorithms in P2P Networks

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3

Employment of Multiple Algorithms for Optimal Path-based Test Selection Strategy. Miroslav Bures and Bestoun S. Ahmed

Harnessing Internet Topological Stability in Thorup-Zwick Compact Routing

A Connection between Network Coding and. Convolutional Codes

Computational Discrete Mathematics

Institutionen för datavetenskap Department of Computer and Information Science

Transcription:

On the placement of traceroute-like topology discovery instrumentation Wei Han, Ke Xu State Key Lab. of Software Develop Environment Beihang University Beijing, P. R. China (hanwei,kexu)@nlsde.buaa.edu.cn Abstract An accurate map of the Internet is very important for studying the network s internal structure and network management. The main approach to map the Internet is to collect information from a set of sources by using traceroutelike probes. In a typical mapping project, active measurement sources are relatively scarce while traceroute destinations are plentiful, which makes the sampled graph quite different from the original one. So, it becomes very important to determine how to place these sources such that the sampled graph can be closer to the original one, especially in the case that the number of sources is limited. In this paper, we investigate the relationship between the placement of traceroute sources and their sampled result, which, to our knowledge, has not been systematically studied before. Based on the relationship, we propose a method on how to place the traceroute sources. We show that the graph sampled from sources selected by our method is more accurate than the ones randomly selected. We also validate our conclusion using the raw trace data of skitter project. (Abstract) Keywords- topology discovery; traceroute sources; placement I. INTRODUCTION A highly accurate map of the Internet topology is a prerequisite to model, analyze and test the Internet. It is also very important for studying the network s internal structure and network management. Now, the main approach to map the Internet is to use traceroute, which can report the interfaces along the IP path from a source to a destination. Typically, a traceroute mapping project has several distributed sources and a large number of destinations. The topology graph can be obtained by merging the traceroute results of each source. Note that the sampled graph is an interface-level topology graph in which each node corresponds to an interface and each link corresponds to the connection between interfaces. We can also get the AS-level topology by converting each interface to its AS number, which can be acquired using whois[1]. Traceroute-like sampling of the Internet has been widely used in a lot of topology discovery systems [2, 3, 4, 5, 6]. Traceroute sources require deployment of dedicated measurement infrastructure, so they are always very scarce compared with the destinations. For example, skitter[2], a very famous project of topology discovery and analysis, sends traceroute probes from 25 sources deployed in America, Europe, Japan and Australia to more than 971000 destinations. The subgraph induced by a single source can be regarded as a spanning tree whose root is the source. And the overall graph is obtained by merging these subgraphs. Due to the limitation of the number of sources, the merged graph can be considerably different from the original one [13]. Lakhina et al. [7] find that traceroute probes are more likely to find nodes and links very close to the source. Nodes and links far from the source have a very low possibility to be explored. DIMES[8] implements the idea of traceroute@home[9] proposed by Donnet et al. and calls for volunteers to download the agent and become a participant. Up to Feb, 2006, there are 12769 probing agents for DIMES. Shavitt and Shir [10] analyze the result of DIMES and show that by adding traceroute sources placed at the periphery of the network, DIMES can find many peering links between small ISPs, while these links can hardly be found by other projects with limited sources. While more traceroute sources are needed in order to get an accurate topology graph, deployment of the instrumentation can be quite costly and more sources would result in sending excessive redundant probes into the Internet [9], which may affect the usual use of Internet. Therefore, when the number of sources is fixed, determining how to place these sources so that the sampled graphs can be closer to the original ones becomes a critical problem. Most topology discovery projects only consider the factor of geography when placing their sources and make them geographically distributed in the Internet. The main contribution of this paper is a method to place the traceroute sources based on the analysis of the features of traceroute. We show that the graph sampled from sources selected using our method is more accurate than the ones randomly selected. We also validate our method using the raw trace data of skitter. The rest of the paper is organized as follows. First, we present some related work on the placement of Internet instrumentation in section Ⅱ. In section Ⅲ, we establish some basic definitions for the topology discovery problem.

In section Ⅳ, we analyze the features of traceroute-like probes. Based on this, we derive the relationship between the placement of the traceroute sources and the graphs sampled from the sources in section Ⅴ, and validate the relationship using both the trace data of a simulated network and the raw data of skitter[2]. Then, we propose a method on the placement of traceroute sources in section Ⅵ. Finally, we summarize, conclude and discuss future work in section Ⅶ. II. RELATED WORK Barford et al. [11] study the marginal utility of adding traceroute sources and destinations. They find that the marginal utility of adding traceroute sources beyond the second source diminishes quickly and it is more important to add destinations than sources. They also argue that the diminishing marginal utility does not imply that the overall coverage obtained is high. However, they can not analyze the coverage of the sampled graph because the work is based on the trace data of skitter[2]. We will verify their point in a simulated network and work out how the low coverage of the sampled graph relates to the features of traceroute probes. Dall'Asta et al. [12] show that the probability that a node or an edge can be detected by traceroute probes depends on the betweenness centrality of the element and the density of traceroute sources and destinations. They also point out that placing the sources and destinations on low-betweenness nodes can increase the coverage of the sampled graph. But this strategy is not applicable in real traceroute probes because determining the betweenness of vertices is hard unless the whole topology graph is known. They recommend that sources should be placed on the low-connectivity vertices because of the correlation between connectivity and betweenness. However, this method is not verified by any experiment. In section Ⅵ, we will perform an experiment with this method and compare it with the method we propose. Jamin et al. [13] study the algorithms for the placement of Internet instrumentation in the context of the IDMaps project[14]. Their work focuses on how to place a fixed set of measurement on generated topologies to minimize the maximum distance between a node and the nearest center. Thus this algorithm can not be applied to traceroute probes. III. DEFINITIONS The Internet topology can be naturally modeled as an undirected graph G= (V, E), where V denotes the set of vertexes (nodes) and E is the set of edges (links). The sampled graph induced by sources S is a subgraph of G and we will use G S = (V S, E S ) to denote it. The fraction of nodes and links detected by a traceroute exploration is defined as follows: Definition 1: Given a graph G= (V, E) and the subgraph G S = (V S, E S ) induced by S, the node coverage of G by G S is the ratio VS ES. Similarly, the edge coverage is the ratio. V E Since this paper focuses on the placement of traceroute sources, we always need to compare the subgraphs induced by different sources. Given two sources S1 and S2 with the subgraphs G S1 and G S2 induced by each one, we define the intersection, union and subtraction of the two subgraphs as follows: Definition 2: GS1 GS2=(( VS1 VS2),( ES1 ES2)) GS1 GS2=(( VS1 VS2),( ES1 ES2)) GS1 GS2=(( VS1 VS2),( ES1 ES2)) Breadth-first search (BFS) is a simple algorithm to explore a graph, in which we start exploring from a node s in all possible directions, adding nodes one layer at a time. We will use L(s,v) to denote the layer in which v is explored in the BFS started from s. Definition 3: 0, if ( v= s) Lsv (, )= L(, sv') + 1,if ( v'v, ) E Using above definitions, we can define the number of nodes on each layer. Let N(s,l) denote the number of nodes on layer l. Definition 4: { } Ns,l ( )= vv VLs,v, ( )= l We use LMN(s) to denote the layer that contains the maximum number of nodes, i.e. the layer l that maximize N(s,l). IV. THE FEATURES OF TRACEROUTE PROBES In order to investigate the traceroute-like exploration process, we generate a graph based on the EBA model [17]. Then we analyze the features of traceroute and classify the nodes and edges that can not be detected by traceroute probes into two categories. A. The graph used for simulation A graph whose topological properties are close to the Internet is needed in order to simulate the traceroute probes. Faloutsos et al.[16] propose the power-law relationship of the Internet topology, which is widely believed to be the most important feature of many complex networks. The EBA model[17] proposed by Albert and Barab asi follows the power-law relationship. We choose EBA model to generate the graph for our simulation and use the typical parameters of EBA model. The main properties of the graph are summarized in table Ⅰ. 2

TABLE I. parameters of EBA model characteristics of the graph PARAMETERS OF EBA MODEL USED AND MAIN CHARACTERISTICS OF THE GRAPH m0 3 m 2 p 0.6 q 0.1 nodes 2000 edges 10985 average degree 10.98 maximum degree 128 The routing policy has to be decided to simulate the traceroute process. In the real Internet, there are many applied routing protocols, e.g. BGP and OSPF. The principle of routing protocols is to make packets in the network reach their destinations as soon as possible, but the actual routing path can be different from the shortest path due to commercial and political factors. Despite these factors, a reasonable approximation of the route traversed by traceroute-like probes is the shortest path between the source and the destination. In the case where there are equivalent shortest paths between two nodes, Dall'Asta et al.[12] define three routing selection mechanisms: USP(Unique Shortest Path), RSP(Random Shortest Path) and ASP(All Shortest Path). Actual traceroute probes may contain a mixture of these three mechanisms. However, we choose USP policy for our simulation because the USP procedure is the closest to the one time running of traceroute probes and represents the worst case scenario. In section 2.3, we will show how the USP policy affects the sampled result of traceroute probes. B. Marginal utility of adding traceroute sources It is shown in [11] that the marginal utility diminishes quickly when sources are added in a traceroute-like process. The authors also point out that the diminishing marginal utility does not imply that the coverage of nodes and edges is high. Since their experiments are based on the real trace data of skitter [2], whose exact topology graph is unknown, they are not able to analyze the coverage of traceroute process. We tackle this problem by simulation on the graph generated in the previous subsection. Fig. 1 shows the results of the marginal utility of adding sources. The sources and destinations of the simulation are randomly selected from the graph. Fig. 1(a) demonstrates the diminishing marginal utility when 1000 nodes are selected as destinations. In the case of 15 sources, 1054 nodes and 4136 edges are discovered, and the node and edge coverage is 52.7% and 37.6%, respectively. To eliminate the factor of destination selection, we perform the simulation in which all nodes in the network are regarded as destinations. The result is shown in Fig. 1(b). When there are 50 sources, the edge coverage is 57.5%, which is still a low percentage. This verifies the conclusion that the overall coverage can still be low even if the marginal utility diminishes quickly. In next subsection, we will investigate the reason of the low coverage by analyzing the features of traceroute probes. C. The features of traceroute probes Dall'Asta et al.[12] show that the probability of a node or an edge being detected depends on the betweenness centrality of the element and the density of sources and destinations. Since we focus on the placement of sources, we will assume that the destinations are plentiful and widely distributed in the graph. In fig. 1(b), we can see that the edge coverage of a single source exploration is only 26.6%, i.e. 73.4% of the edges are invisible from a single measurement source even if the destinations are plentiful. In the following subsections, we will investigate the invisible part of the graph and find out the reasons for the low coverage. 1) Cross-link The subgraph induced by a single source can be regarded as a spanning tree rooted at the source. If all the traceroute probes are started simultaneously, the traceroute process can be abstracted as the breadth-first search of the spanning tree and layer in the breadth-first search corresponds to hop in traceroute. Cross-link[18] is a link joining two nodes of the same layer, and thus can not be detected by the breadthfirst traverse of the spanning tree. For example, in fig. 2, the nodes and edges connecting H and D can not be detected by traceroute from source to dest1 and dest2 because H and D are on the same layer of the spanning tree. This problem is called cross-link. The usual way to fix the problem is to add sources. For example, these nodes and edges can be detected if H is also a traceroute source. 2) Equivalent shortest paths Besides cross-link, nodes and edges may also be undetected by traceroute probes for the existence of equivalent shortest paths. This is shown in fig. 3, where there are 4 equivalent shortest paths from source to dest1, but only one of them can be detected in one time exploration of traceroute. This corresponds to the USP policy. Although ASP may happen in a long time exploration, it is hard to detect all the equivalent shortest paths in one time or shortperiod exploration. Even in a long time exploration, many paths may not be detected because of backup routes, which is pervasive in real Internet. Adding sources is an efficient way to fix this problem. In fig. 3, if a source is placed on node D, more edges can be detected when D traceroute to dest2. We present the traceroute exploration result of a single source in table Ⅱ. In the case that all the nodes in the graph are destinations, the edges detected by a single source is only 22.7% compared with the original graph. We can see that the edge coverage of single-source traceroute exploration is very low even if the destinations are plentiful. This is an important feature of traceroute exploration. Cross-links and equivalent shortest paths both contribute to the low coverage. 3

(a) Number of destinations = 1000 (b) All nodes are destinations Figure 1. Number of nodes and edges detected as sources are added source source A B C Figure 2. Illustration of cross-link A B C D E F G H D E F G H dest1 dest2 dest1 dest2 Figure 3. Illustration of traceroute probes with equivalent shortest paths TABLE II. THE EDGES OF THE SAMPLED GRAPH INDUCED FROM A SINGLE SOURCE total edges 10985 percentage edges discovered 2494 22.7% edges invisible due to crosslink 5050 45.9% edges invisible due to 3441 31.3% equivalent shortest paths V. THE RELATIONSHIP BETWEEN PLACEMENT OF TRACEROUTE SOURCES AND THEIR SAMPLED RESULT As mentioned in the previous section, the subgraph induced by a single source is actually a spanning tree rooted at the source. The overall topology graph is obtained by merging the sampled results of each source. Work by [7] show that there is not much difference among the coverage of the subgraphs induced by each source. So, if the number of sources is fixed, the coverage of the merged graph actually lies on the size of the intersection of every pairs of the subgraphs. For example, given two sources S1 and S2 with the subgraphs G S1 and G S2, if S1 and S2 are exactly the same, the merged graph of S1 and S2( GS1 GS2 ) is the same as S1 or S2; on the contrary, if S1 is entirely different from S2 ( GS1 GS2= ), the size of merged graph is the sum of S1 and S2. In this section, we will first examine how to minimize the intersection of two sampled graphs based on the feature of traceroute probes, and then we will validate our method using both the simulated graph and the trace data of skitter[2]. A. Minimize the intersection of two sampled graphs This subsection is devoted to the following problem: Given a traceroute source S1 and the subgraph G S1 induced by it, how to place the second source S2 so that the intersection of G S1 and G S2 can be as small as possible, i.e. S2 can detect more nodes and edges which are invisible from S1. It is known that a graph consists of nodes and edges. Usually the coverage of nodes is proportional to the coverage of edges, so we only take the coverage of edges into account and assume that all nodes in the original graph are destinations. From the previous section, we know that the two origins of the undetected edges are cross links and equivalent shortest paths. We will analyze them respectively. 1) Cross-link Cross-link is the link joining two nodes of the same layer and thus can not be detected by the source. If S2 is placed on layer N, cross-links on layer N can be detected by the 4

exploration from S2 to the other nodes on layer N. Since the number of cross-links on a certain layer is proportional to the number of nodes on the layer, we can place S2 on the layer that contains the maximum number of nodes, i.e. LMN (S1). 2) Equivalent shortest paths Edges of a network can be divided into two sets: edge joining two nodes of the same layer and edge joining two nodes belonging to adjacent layers. The former one is actually the set of cross-links and none of these edges can be detected by traceroute probes. Edges in the latter set can partly be detected. If more than one node in the same layer are joined to a node in the next layer, there will be equivalent shortest paths to the node and only one of these paths can be detected by traceroute probes. In fig. 3, node A and B in layer 1 are both connected to D in layer 2, so there are two equivalent shortest paths from source to D. The statistic of the latter set on each layer in fig. 4 demonstrates that the undetected edges of each layer are proportional to the total edges. Lakhina et al. [7] show that traceroute probes are more likely to find edges very close to the source. So if S2 is placed on the layer that has maximum number of connections to its adjacent layer, more edges could be detected. 3) How to place S2 From the analysis of the previous subsections, we can conclude that in order to detect more cross-links, S2 should be placed on layer LMN (S1), while if S2 is placed on the layer that has the maximum number of connections to its adjacent layer, it can detect more edges that are invisible from S1 due to the existence of equivalent shortest paths. Fortunately, the two layers are always the same, which we demonstrate in fig. 5. Therefore, S2 should be placed on the layer LMN (S1) so that S2 can detect more edges that are invisible from S1. To validate our conclusion, we randomly select four sources as S2 from each layer and run traceroute explorations from them. Fig. 6 plots the comparison results between S1 and S2. Note that the source S1 in fig. 6 is exactly the same one as in fig. 4 and 5. We can observe that sources in layer 3, which contains the maximum number of nodes (shown in fig. 5), can always detect more links that can not be detected by S1. Until now, we have made the assumption that all the nodes in the original graph are destinations in the traceroute exploration process. We also validate our conclusion in the case that only parts of the nodes are destinations. Fig. 7 demonstrates that sources on the layer with the maximum number of nodes can also detect more nodes that S1 can not detect. B. Analysis of raw trace data of skitter We have made the conclusion that the second source S2 should be placed on layer LMN (S1). In this section, we will validate our conclusion by analyzing the raw trace data of skitter. There are 25 sources of skitter running independent probes to more than 971000 destinations. Each source needs two or three days to complete a probe cycle, so the start time of a cycle is different among all sources. We choose 11 sources which started a new cycle on Nov 2, 2006 and list the exploration result of each source in table Ⅲ. It can be observed that there is not much difference among the numbers of nodes and edges discovered by different sources except some abnormal ones, e.g. the source named h-root. To eliminate the factors that affect the comparison between sources, we will exclude those abnormal sources (cdgrssac,h-root,i-root,uoregon). The comparisons between the other sources are shown in fig. 8. Each graph plots the comparison result between a certain source S1 (in a, b and c, the source S1 is cdg-rssac, champagne and f-root, respectively) and the other ones (S2). The four curves represents nodes or links detected on layer L(M1,M2), and additional nodes or links detected by M2. We can see that the four curves exhibit the same trend in all graphs. Fig. 8 thus supports the conclusion made by us. VI. THE METHOD TO PLACE SOURCES A. The method From the analysis and experiments in section 3, we conclude that given a source S, another source should be placed on the layer LMN(S) such that it can detect more nodes and edges that can not be detected by S, and the intersection of their sampled results can be smaller. Based on this conclusion, we propose an iterative method to select sources. The key idea of the method is: whenever adding a new source, the intersections of the sampled results of the new source and each existing source should be as small as possible. The detailed procedure is: Initialization. Randomly select a node from the original graph as the first source S and run a traceroute exploration from S. Sort all the nodes detected according to their path length from S, i.e. their layers. Initialize set M to include all the nodes in layer LMN(S). If M is not empty, randomly select a node S from M as the next source, and erase the node from M; else, terminate. Use the source S selected in step 2 to run a traceroute exploration and sort all nodes detected. Initialize set M to include all the nodes in layer LMN(S ) Set M to be the intersection of M and M. Go to step 2. 5

Figure 7. Comparison of number of nodes detected between S1 and S2, number of destinations=1000 Figure 4. Edges of the layer represents the total number of connections to adjacent layer; edges undetected represents the number of undetected connections to adjacent layer Figure 5. Correlation between nodes and edges on each layer. edges detected by S2 but not detected by S1 1400 1200 1000 800 600 400 200 0 1 2 3 4 5 layer Figure 6. Comparison of number of links detected between S1 and S2 nodes detected by S2 but not detected by S1 160 140 120 100 80 60 40 20 0 1 2 3 4 5 layer Figure 8. Comparison results between sources. The four curves represent the nodes/links on layer L(M1,M2) and the nodes/links detected by M2 but not by M1. The first source S is randomly selected in step 1 and set M is initialized to contain all the nodes on layer LMN(S). The other steps ensure that whenever adding a new source, the intersection of the subgraph induced by the new source and each existing source should be as small as possible. 6

TABLE III. EXPLORATION RESULT OF EACH SOURCE OF SKITTER Source Nodes detected Edges detected cdg-rssac 441117 528179 f-root 516695 613610 h-root 330862 401380 i-root 432354 511793 champagne 497823 593135 uoregon 460739 547773 arin 512445 609243 sjc 529833 630252 riesling 482590 572730 mwest 496489 588622 lhr 504501 594219 B. Simulation on the modeled graph We simulate the traceroute exploration using the sources generated by the method proposed in the previous section and compare it with sources randomly selected. Dall'Asta et al. [12] suggest that traceroute sources should be placed on low-connectivity nodes, i.e. nodes with low degrees. So we also simulate the exploration by sources with lowest degrees in the graph. The comparison results are plotted in fig. 9 and 10, showing that our method is better than the other two. In case of 15 sources and 1000 destinations (fig. 9), the sources generated by our methods can detect 57 (2.8% of the original graph) additional nodes and 1630(14.8% of the original graph) additional links compared with the ones randomly selected. While in the case of all the nodes as destinations (fig. 10), our method can detect 1986 additional links (18% of the original graph). We can also find that the sources with lowest degrees can only slightly increase the node and edge coverage compared with sources randomly selected. We also perform the above experiments on graphs generated by other models and topology generators, e.g. the GL model[19], the inet topology generator[20], even ER random graphs[21]. Most of the comparison results are similar except the one of MR model[22], which we demonstrate in Fig. 11. Unlike the other graphs, our method shows similar performance with the other two selection methods. This is because the marginal utility of adding sources on this graph is very high. Actually, 19 sources randomly selected can detect nearly 90% of all links. This is a big surprise due to the analysis of MR models in [13]. However, we assume this is an extreme and very unusual phenomenon. Further studies should focus on how the characteristics of graph affect the performance of our method. From the above experiments, we conclude that in most cases, our method can increase the coverage of the traceroute exploration, especially the link coverage. Note that, the improved node coverage in fig. 9 is only 2.8%, which is very low compared with the link coverage. That is because the marginal utility of node coverage decreases very quickly (shown in fig. 1) due to the low density of destinations. It also demonstrates that the number of destinations should be increased in order to obtain a high coverage of nodes. VII. CONCLUSION We analyze the features of traceroute probes in this paper. Based on the features, we identify the two reasons that contribute to the low coverage of traceroute-like explorations. By analyzing the relationship between the placement of traceroute sources and the intersection between the subgraphs induced by the sources, we propose an iterative method on how to place the traceroute sources such that the overall coverage of the exploration can be as higher as possible, i.e. the exploration process can detect as more nodes and edges as possible. We also validate our method on graphs generated by different models. We hope that the method we proposed can provide helpful guidance to the deployment of traceroute sources in real Internet topology discovery and other projects using traceroute-like probes. We have validated the relationship between the placement of sources and the subgraphs induced by them using the raw trace data of skitter. The result shows that the relative location between sources can affect the results of the merged graph. So, whenever adding a new source, we should make sure that the new source can detect more nodes and edges that are invisible from the existing sources. ACKNOWLEDGMENT The raw trace data of skitter in this research help us validate our conclusion and analyze the real Internet topology discovery process. We would like to thank the operators of skitter project. References [1] Whois tool. http://www.ripe.net/ris/riswhois.html [2] Cooperative Association for Internet Data Analysis Skitter tool. http://www.caida.org/tools/measurement/skitter/ [3] The National Laboratory for Applied Network Research (NLANR). http://moat.nlanr.net/ [4] Topology project. http://topology.eecs.umich.edu/ [5] SCAN project. http://www.isi.edu/div7/scan/ [6] Internet mapping project at Lucent Bell Labs. http://www.cs.belllabs.com/who/ches/map/ [7] A Lakhina, JW Byers, M Crovella, and P Xie. Sampling biases in IP topology measurements. In IEEE INFOCOM, 2003. [8] DIMES project. http://www.netdimes.org/ [9] B Donnet, P Raoult, T Friedman, M Crovella. Efficient algorithms for large-scale topology discovery. ACM SIGMETRICS Performance Evaluation Review, 2005. [10] Shavitt, E Shir. DIMES: Let the Internet Measure Itself. ACM SIGCOMM Computer Communication Review, 2005 [11] P Barford, A Bestavros, J Byers, M Crovella. On the Marginal Utility of Network Topology Measurements. Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, 2001. 7

(a) Number of destinations = 1000 (b) All nodes are destinations Figure 9. Comparison between three methods of selecting sources, number of desinations=1000 Figure 10. Comparison between three methods of selecting sources, all nodes are destinations [12] L. Dall'Asta, I. Alvarez-Hamelin, A. Barrat, A. Vázquez, and A. Vespignani. Exploring networks with traceroute-like probes: Theory and imulations. Theoretical Computer Science, Special Issue on Complex Networks, 2005. [13] JL Guillaume, M Latapy. Relevance of massively distributed explorations of the internet topology: Simulation results. In IEEE INFOCOM, 2005. [14] S Jamin, C Jin, Y Jin, D Raz, Y Shavitt, L Zhang. On the placement of Internet Instrumentation. In IEEE INFOCOM, 2000. [15] Paul Francis, Sugih Jamin, Vern Paxson, Lixia Zhang, Daniel Gryniewicz, and Yixin Jin. An architecture for a global internet host distance estimation service. In IEEE INFOCOM, 1999. [16] C. Faloutsos, P. Faloutsos, and M. Faloutsos. On Power-Law Relationships of the Internet Topology. In Proceedings of the ACM SIGCOMM, 1999. [17] R eka Albert and Albert-L aszl o Barab asi. Topology of Evolving Networks: Local Events and Universality. Physical Review Letters, 2000 [18] R Siamwalla, R Sharma, S Keshav. Discovering Internet Topology. In IEEE INFOCOM, 1999. [19] JL Guillaume, M Latapy. Bipartite graphs as models of complex networks. Physica A. [20] Inet, currently at version 3.0, is an Autonomous System (AS) level Internet topology generator. topology.eecs.umich.edu/inet. [21] P Erdös, A RényiOn random graphs I. Publ. Math. Debrecen, 6:290ñ297, 1959. [22] M. Molloy and B. Reed. The size of the giant component of a random graph with a given degree sequence. Combin. Probab. Comput., pages295ñ305, 1998. Figure 11. Comparison between three methods of selecting sources, on MR graph 8