2013 Third International Conference on Intelligent System Design and Engineering Applications A Novel ALTO Scheme for BitTorrent-Like P2P File Sharing Systems Liu Guanxiu, Ye Suqi, Huang Xinli Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China gxliu920@gmail.com, ysq227@163.com, xlhuang@cs.ecnu.edu.cn Abstract The Peer-to-Peer (P2P) file sharing system is one of the most widely-used P2P applications. Its traffic represents about 60%~80% of the whole Internet traffic. BitTorrent is a representative P2P file sharing system. In such systems, the schemes of choosing neighbors and selecting pieces are mainly based on the knowledge of overlay topologies, without considering the underlying Internet topology, which may lead to too much inter-isp (Internet Service Provider) traffic and serious performance degrading of the whole systems. To overcome these problems, we design a new scheme to improve the performance of BitTorrent-like P2P file sharing systems, inspired by the idea of Application Level Traffic Optimization (ALTO). Firstly, we propose a novel approach to make BitTorrent node aware of the topology of underlying networks. Then we modify BitTorrent's original algorithms and replace them with three new localized algorithms based on autonomous system (AS) hops. Finally, we conduct comprehensive experiments to verify the correctness and effectiveness of our scheme based on the General Peer-to-Peer Simulator (GPS). The simulation result shows that, with our scheme, nodes in BitTorrent-like systems have better sense of the topology of their underlying networks, and can interact more efficiently. Besides, our scheme can help to decrease inter-as or inter-isp traffic, optimize traffic distribution across the whole networks, and improve the quality of experience of P2P users. Keywords-Peer-to-Peer Networks; File Sharing Systems; Traffic Optimization; BitTorrent; AS Hops I. INTRODUCTION Peer-to-Peer (P2P) systems have gained tremendous popularity in the past few years. More and more users are continually joining such systems and more resources are being made available, which attracts even more users to join. P2P applications including file sharing systems such as BitTorrent, edonkey have seized about 60%~80% of the entire Internet traffic. The ever-increasing traffic not only brings enormous revenues to Internet Service Providers (ISPs), but also poses a significant engineering challenge to ISPs. By building overlay networks that are oblivious to the underlying Internet topology and routing, P2P systems make several network resources irrationally utilized and P2P traffic always starves the bandwidth of other applications like web traffic. In fact, previous studies have shown that many of the requested P2P objects were downloaded from peers outside the local network even though they were locally available. ISPs have to pay more for the inter-isp traffic. That increases the operating cost of ISPs. The reason why BitTorrent-like P2P file sharing systems bring so much inter-isp traffic is that BitTorrent-like P2P file sharing systems always use random methods to choose neighbors and don t concern the impact brought by data transmission between inter-isp peers which may lead to performance degrading of the whole network. Besides, the limited links may be occupied by the vast inter-isp traffic and these ISPs would be overloaded. In this way, the opposing reaction of ISPs will occur. Meanwhile, the inter-isp links are always the network bottleneck of heavy congestion. Consequently, the download rate of P2P systems may be influenced and then the function will be weakened. All of this may result in serious performance degrading of the whole systems. Usually, ISPs limit the traffic of BitTorrent-like P2P file sharing systems by closing download ports and limiting bandwidth to decrease operating cost. These approaches can only decrease the data transmission rate, but it can t decrease the whole P2P traffic essentially. On the other hand, the links between ISPs are limited. These links are important because they bear the transmission responsibilities of data and route information. Consequently, they may become bottlenecks when the traffic is heavy. There are many solutions such as estimating topology information focusing on predicting network distance in terms of latency, constructing some entities to feed back the underlying topology information to overlay and etc. We, in this paper, adopt partial idea of the latter, and cluster the P2P nodes based on which Autonomous System (AS) they belong to. Under the administering of the leaders, the P2P nodes load will be lightened. In order to guide the behaviors of all peers in terms of global view, we also construct resource indexing entities. These entities can also initiate load balancing process according to the information which they have known. We perform event-based simulations to evaluate the performance of our algorithm with BitTorrent protocol. The results also show that our scheme can improve the performance of P2P file sharing systems. The rest of this paper is organized as follows. In Section II, we introduce the Application Layer Traffic Optimization (ALTO) problem, the BitTorrent-like P2P file sharing systems and some existing solutions. Our improved algorithms are described in Section III. In Section IV, we evaluate the performance of our algorithms using event-driven simulation. Section V concludes the paper. II. RELATED WORK A. The Application Layer Traffic Optimization (ALTO [1]) Approach As is often the case, P2P systems ensure that popular contents are received by peers in the overlay. However, most P2P systems usually employ an arbitrary peer selection policy that ignores the underlying Internet 978-0-7695-4923-1/12 $26.00 2012 IEEE DOI 10.1109/ISDEA.2012.39 135
topology and ISP link costs, establishing connections between randomly chosen subsets of cooperating peers from around the world. This behavior may lead to suboptimal choices. Most distributed hash tables (DHT) use greedy forwarding algorithms to reach their destination, making decisions that may not turn out to be globally optimized. This naturally leads to the ALTO problem: how to best provide the topology of the underlying network while at the same time allowing the requesting node to use such information to effectively reach the node on which the content resides. Thus, it would appear that P2P networks with their application layer routing strategies based on overlay topologies are in direct competition against the Internet routing and topology [2]. One way to solve the ALTO problem is to build network coordinate systems which embed the network topology into a low dimensional coordinate space and enable network distance estimations based on latency. But this is not always without errors. Because the latency may be fluctuating, which impact the accuracy of distance estimations. Furthermore, while routing in a low dimensional coordinate space, the problem of local minima [3] is inevitable. Instead of building topology network coordinate systems on the application level through distributed measurements, this information could also be provided by ISPs or other entities. Because they have full knowledge of the topology of the networks they operate on. In order to avoid congestion on critical links, they are interested in helping applications to optimize the traffic they generate. P4P architecture proposed in [4] has been adopted by many P2P software distributors and technology researchers with the goal of optimizing utilization of network resources. The key role in the P4P architecture is played by servers called itrackers, deployed by ISPs and accessed by P2P applications (or the P2P systems) in order to make optimal decisions when selecting a peer to connect. B. The BitTorrent-Like P2P File Sharing Systems After Napster appeared in 1999, many P2P file sharing systems have been developed such as Gnutella, KaZaA, FastTrack, BitTorrent and edonkey. BitTorrent, which is seen as a revolutionary download tool, inherits many technological advantages of several former P2P file sharing systems and has become one of the most popular P2P file sharing systems currently. [5] shows that the total traffic produced by BitTorrent is more than half of that produced by all P2P file sharing systems. The rapid-increasing users and traffic of BitTorrent have attracted many institutions attention. The research direction of BitTorrent could be classified as follows. Firstly, BitTorrent technology has been applied to many other domains. [6] built a CDN system on P2P network, and designed a novel route protocol based on content which can support large-scale and highly dynamic distributed applications. [7] embedded BitTorrent technology into VOD of stream media which gained a good performance. Secondly, researchers studied and simulated some novel algorithms on BitTorrent itself. [5] analyzed BitTorrent system on 4 aspects including the usability of network, the response of system when users increase rapidly, the integrality and the performance of system while downloading. [8] analyzed the traffic of BitTorrent network, load and characteristics of network. At present, there are many P2P file sharing systems. Actually, they are packed based on BitTorrent and add some GUIes. Some P2P applications add some functions to the kernel of BitTorrent which can solve some usual problems while using BitTorrent systems. The piece selection and peer selection algorithms are very important in BitTorrent file sharing systems. They can affect the performance of BitTorrent systems enormously. The main piece selection algorithms of BitTorrent are Strict-priority, Rarest first, Random first piece and Endgame mode. The main peer selection algorithms are Random algorithm, Choke/Unchoke algorithm and Optimistic unchoking algorithm in BitTorrent. In our scheme, we will improve some of these algorithms. III. ALTO FOR BITTORRENT-LIKE P2P FILE SHARING SYSTEMS According to Choke/Unchoke algorithm, peers choose the peers whose download or upload rate is the fastest as their candidate neighbors in BitTorrent system. However, the neighbor selection algorithm of BitTorrent is random. So the chance of peer selecting external networks is large. This probably leads to much unnecessary inter-isp traffic. To overcome these problems, we introduce the generation of AS topology and the improved algorithms in the following. A. The Generation of AS Topology In order to make the traffic of BitTorrent stay in the local network by using ALTO approach, we need to adopt some methods to let BitTorrent know some information of underlying topology. We can obtain a peer s AS number through its IP address and some public AS searching institutions. So it s not difficult to know which AS an IP address belongs to. After collecting such information, we can get a map table of peers IP addresses and its corresponding AS numbers, i.e. AS-IP map table. Next we will show how to obtain an integrated AS topologic map through this map table. If all BitTorrent peers belong to N ASes, we select one peer randomly from every AS, then we run traceroute between any two peers to obtain their end-to-end route on IP layer (only run times at least). If traceroute program fails to return, we can try to select other two peers which belong to those two ASes to find route. Next, we find all peers AS numbers in this route through AS-IP map table. Thus an end-to-end route passes through several ASes linearly. Then we treat the consecutive IP addresses in the same AS as an AS node. Thus, we convert IP layer s route to AS s route and then obtain an integrated AS topologic map. At last, with Floyd algorithm, we could get a table with shortest hops of any two ASes and save it in trackers. B. Algorithm Design 1) Tracker Localized Algorithm In order to make trackers concern the local characteristics, we modified the random select algorithm of peers. When we receive a request from a peer, the tracker sorts AS hops of the request peer to each 136
candidate peers first. The candidate peers usually exist in a table and they are managed by the trackers. We suppose the default number of candidate peers in the table is 50. If the number of candidate peers in the table is 50 or less than 50, the tracker sends the table to the request peer directly. If the number is more than 50 and the AS number of the 50 th peer is different from the 51 st s, the tracker returns the first 50 peers. If the AS number of the 50 th peer is the same as the 51 st s, we take down the AS number of the 50 th peer and suppose its AS number is n, then remove the peers whose AS numbers are n until the AS number of the 50 th peer is different from the 51 st s, then the tracker returns the first 50 peers. At last, the request peer asks for files from the peers that the tracker returns directly. 2) Chocker Localized Algorithm In original BitTorrent, seeds select 4 download peers whose download rate is the largest to unchoke, download peers select 4 upload peers whose upload rate is the largest to unchoke. This behavior ignores the physical position of peers and usually brings extra inter-isp traffic. So, we modified the choking/unchoking algorithm to make peers unchoke their neighbors based on AS hops. Although the distance of two peers seldom change, the modified algorithm will not make peers always select the same neighbors. The reason is that the choking/unchoking algorithm selects appropriate peers just among its interested neighbors; in other words, it selects peers among the neighbors whose pieces collection is different from the request peer. Although pieces collection of every peer changes continuously, the interested neighbors of each peer change continuously too, the algorithm can ensure that a peer always exchanges data with its closest neighbor. We deploy a similar choking/unchoking policy on seeds. A seed keeps unchoking only with 4 closest neighbors who still have not finished downloading. Only after one of the 4 neighbors has downloaded all pieces of a file, will the seed unchoke another closest neighbor. This is opposite to the design of the original BitTorrent. The original BitTorrent gives priority to neighbors whose download/upload rate is the largest, but our algorithm is based on distance and it ensures that a seed sends pieces to its closest neighbor first. If the upload rate of all peers that have the same interest is similar, the closest peer to the seed has the largest probability to finish downloading first and becomes a seed. This will make seeds pervade all around from the center. 3) Piecepicker Localized Algorithm As we know, the piece selection policy in BitTorrent includes Strict-priority, Rarest first, Random first piece and Endgame mode. Rarest first is the main factor of incurring inter-isp traffic among these piece selection policies. This paper deploys the Piecepicker localized algorithm instead of Rarest first algorithm to encourage the peer to first download the closest pieces to itself. Firstly, the download peer will count the distance value to each piece. The distance value refers to the average value of AS hops between the peers who own this piece and the download peer. For instance, 3 peers own a piece that the download peer needs, their AS hops to the download peer is 0, 2 and 4, so the distance value of this piece to the download peer is 2. Piecepicker localized algorithm chooses the piece whose distance value is the smallest to download first. Different from Rarest first policy of the original BitTorrent, which intends to enhance the dispersing of the smaller interest swarm, Piecepicker localized algorithm intends to make pieces pervade from the center of seeds to around based on AS hops. IV. EVALUATION AND PERFORMANCE ANALYSIS A. Experimental Setup In order to verify whether our scheme is feasible or not, we use the Java-based General Peer-to-Peer Simulator (GPS) [9] as our simulation platform. GPS is a message-level, event-driven P2P simulation framework aiming at modeling P2P protocols accurately and effectively in a realistic and dynamic environment. It can also simulate the topology and characteristics of networks precisely. Firstly, we generate 1024 nodes randomly on the topology that GPS simulates, and distribute them in AS 0, AS 1, AS 2, AS 3 and AS 4. The AS topology is shown in figure 1. 32 of these 1024 nodes are core nodes which only take charge of transmitting data. The bandwidth of links between the core nodes is evenly fluctuant between 100Mbps and 200Mbps. The other 992 nodes are edge nodes. The bandwidth of links between the core nodes and the edge nodes is evenly fluctuant between 20Mbps and 100Mbps. In these edge nodes, 512 nodes are peers, some of these 512 peers are seeds (the size of the shared file is 50M). Then we let a different number of peers download the file randomly every 10 seconds. Figure 1. The AS topology in our simulations In order to add more authority to our experiment results, we set several scenes including 16 peers, 32 peers, 45 peers, 64 peers, 128 peers and 512 peers and take more than 5 rounds in every scene. In each round, all conditions including the position of nodes, the bandwidth of links, the delay of links, the seeds and the download events are all generated randomly in some special range except for factors such as the number of nodes and peers, the size of files, the number of edges, the number of ASes and AS topology. Finally, we average the experimental results obtained from the scene which the number of download peers is N and then compare them with that of the original algorithm. B. Performance that Users Care About We compare the performance that users care about in this part; these performance metrics include the average 137
download time of peers and the breaking time of peers. The results are presented as follows. Figure 2 shows the average download time of peers for each localized algorithm and the original BitTorrent algorithm. We can conclude that when we use the Tracker localized algorithm, the download time of peers decreases to the largest extent, but when there are few download peers, it doesn t decrease but increase a bit. That is because peers select neighbors randomly to transmit data in original BitTorrent. When there are a few download peers, the bandwidth of inter-as links is large enough so that the download rate is faster. However, after the trackers are localized, the peers always select neighbors locally. Because the bandwidth of local links is smaller than that of inter-as links, the data transmission rate is slower on local links. With the increase of download peers, the peers in the original BitTorrent still select neighbors randomly which may lead to congestion on the inter-as links and performance degrading of the whole network. The Tracker localized algorithm makes the traffic localized and utilizes bandwidth effectively. However, the breaking time is the most when using Tracker localized algorithm. With the increase of download peers, the improvement for breaking time of all localized algorithms is getting more and more obvious. Figure 3. The time of all peers breaking downloading for each localized algorithm and the original BitTorrent algorithm C. Performance that ISP Cares About In this part, we present the performance that ISP cares about. The performance metrics include the inter-as traffic and the traffic of different AS hops. At last, we compare the original BitTorrent and the improved algorithm based on these performances. The result is shown as follows. Figure 2. Average download time of peers for each localized algorithm and the original BitTorrent algorithm Besides, from figure 2 we can see that the decrease extent of peers download time using Chocker localized algorithm is a bit larger than using Piecepicker localized algorithm. When there are only a few download peers, the average download time decreases a little in these two localized algorithms. The reason is that peers always select the closest among random neighbors to unchoke or download the closest pieces, which makes the download efficiency raised a lot even when there are few seeds. If there are many download peers, there will be more and more seeds after many peers have finished their downloading, the decrease extent of peers average download time is a bit larger when using Chocker localized algorithm and Piecepicker localized algorithm and tends to keep steady. If peers can t find some pieces of a file during the process of downloading, then the downloading will be in an interrupted state until the peers have received the pieces. Figure 3 shows the time of peers breaking downloading. We can see that the breaking time of peers is the least when using Chocker localized algorithm and the following is the Piecepicker localized algorithm. Figure 4. The traffic of inter-isp for each localized algorithm and the original BitTorrent algorithm Figure 4 shows the traffic of inter-as links. All the three localized algorithms have optimized the traffic of inter-as, and the decreasing extent is similar. It is worth noticing that the extent of traffic decrease is more obvious with the increase of the download peers. It is because, after downloading for a period of time, there are quantities of seeds in every AS. Peers can download files in the ASes they belong to. From figure 4 we can also see that the Piecepicker localized algorithm decreases the traffic of inter-as most obviously when there are many download peers. The reason is that the number of seeds in an AS is constantly increasing, which makes the Piecepicker localized algorithm optimize the performance more. 138
ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China (Grant No. 60803127) and the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 200802691053). Figure 5. The traffic volume of different AS hops To show the traffic of different AS hops using different algorithms, we fix the number of download peers at 64 and keep other conditions unchanged, then we measure the traffic of different AS hops. For example, 3 AS hops traffic means the traffic brought by the seeds whose AS hops are 3 and their download peers. As shown in figure 5, the traffic volume of different AS hops in different localized algorithms is similar. Chocker localized algorithm brings more local traffic than other algorithms. Besides, compared with the original BitTorrent, the 0 AS hop s traffic increases a lot in the three localized algorithms and this indicates that our scheme can effectively decrease the inter-isp traffic. V. CONCLUSION AND FUTURE WORK In this paper, we presented the design, implementation and evaluation of an effective and scalable approach for improving the performance of BitTorrent-like P2P file sharing systems inspired by the idea of ALTO. Firstly, we proposed a novel approach to make BitTorrent node aware of the topology of the underlying networks. Then we modified BitTorrent's original algorithms involving choosing neighbors, selecting pieces and choking/unchoking, and replaced them with Tracker localized algorithm, Chocker localized algorithm and Piecepicker localized algorithm. Finally, we conducted comprehensive experiments to verify the correctness and effectiveness of our scheme based on GPS a well-known P2P simulator. Experimental evaluation has shown that, with our scheme, nodes in BitTorrent-like systems have better sense of the underlying networks topology, and can interact more efficiently. Besides, our scheme can help to decrease the inter-as or inter-isp traffic without sacrificing performance, optimize the traffic distribution across the whole networks, and improve the quality of the personal experience of P2P users. In order to perfect our scheme, we still need to do a lot of work in the future. For instance, we can add some other strategies to our scheme; we can consider the file fragmentation; we can predict the users next tendency for what kind of resources they will be interested by analyzing their behaviors and so on. All of this needs us to do deeper research. REFERENCE [1] J. Seedorf, S. Kiesel, M. Stiemerling, Traffic Localization for P2P-Applications: The ALTO Approach, In: Ninth International Conference on Peer-to-peer Computing (IEEE P2P 2009), pp. 171-177, Seattle, 2009. [2] I. Rimac, V. Hilt, M. Tomsu, V. Gurbani, E. Marcocco, A Survey on Research on the Application-Layer Traffic Optimization (ALTO) Problem, RFC 6029 (Informational), 2010(10). [3] Q. Fang, J. Gao, L.J. Guibas, V.D. Silva, L. Zhang, GLIDER: Gradient Landmark-Based Distributed Routing for Sensor Networks, In: IEEE INFOCOM, vol. 1, pp. 339-350, Florida, 2005. [4] H.Y. Xie, A. Krishnamurthy, A. Silberschatz, Y. Yang, P4P: Explicit Communications for Coorperative Control Between P2P and Network Providers, P4PWG Whitepaper, 2007(5). [5] J. Pouwelse, P. Garbacki, D. Epema, H. Sips, The BitTorrent P2P File-Sharing System: Measurements and Analysis, In: 4th International Workshop on Peer-to-Peer Systems, vol. 3640, pp. 205-216, New York, 2005. [6] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A scalable content-addressable network, In: Proceedings of the International Conference of the Special Interest Group on Data Communication, vol. 31, pp. 161-172, New York, 2001. [7] C. Dana, D. Li, D. Harrison, C.-N. Chuah, BASS: BitTorrent Assisted Streaming System for Video-on-Demand, In: 7th Workshop on Multimedia Signal Processing, pp. 1-4, California, 2005. [8] H. Chen, Z.L. Yang, J.H. Sun, Critical load of peer-to-peer networks, Journal of Huazhong University of Science and Technology (Natural Science Edition) 35(3), 35-37, 2007. [9] W. Yang, N. Abu-Ghazaleh, GPS: A General Peer-to-Peer Simulator and its Use for Modeling BitTorrent, In: 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 425-432, Atlanta, 2005. 139