BitTorrent Fairness Analysis Team Asians Zhenkuang He Gopinath Vasalamarri
Topic Summary Aim to test how the fairness affect the file transfer speed in a P2P environment (here using the BitTorrent Protocol) Two types of fairness: Good Fairness : Every peer in the system can download the file what it wants and upload what it has (if require). Bad Fairness : Some peers in the system don t upload the file what it has but can download what it wants. Goal: Analysis the average file transfer speed in different types of fairness operating environments. 2
Topic Summary Hypothesis The average (download) speed (within a given time) in the good fairness system is higher than that in the bad fairness system. AS = TDS / NN AS: Average (download) Speed TDS: Total Download Speed NN: the Number of Node which has participated downloading TDS: Total Download Speed TFS: Total Files (transferred) Size TT: Total using Time TDS = TFS / TT 3
Topic Summary P-value T-test If the p-value is large or t is close to 0 (positive or negative), we conclude the hypothesis is false. If the p-value is small, say less than 0.05 or 0.01, but t is negative (<0), the hypothesis is false. If the p-value is small, say less than 0.05 or 0.01, and t is passive (>0), the hypothesis is true. 4
Research Paper I Modeling and Performance Analysis of BitTorrent-Like peerto-peer networks Dongyu Qiu and R. Srikant. 2004. Modeling and performance analysis of BitTorrent-like peer-to-peer networks. SIGCOMM Comput. Commun. Rev. 34, 4 (August 2004), 367-378. DOI=10.1145/1030194.1015508 http://doi.acm.org/10.1145/1030194.1015508 5
Research Paper I Problem Definition 1. Most of the previous models used Markov chain models to test the scalability, performance and efficiency of BT. 2. Such an approach is mathematical difficult to fathom. 3. Another alternative approach is a simple fluid model. 4. This fluid model gives us a set of expressions which can be easily used to study various factors of BT. 6
Research Paper I Key points in the paper 1. Peer Evolution : The number of peers in the system is a strong factor in the network performance. 2. Scalability : Network performance can be studied as the average file downloading time and the size of the network by the number of peers. 3. File-sharing protocol : To match a peer with all other peers that have the file, so that maximum downloading bandwidth is utilized. 7
Simple Fluid Model x(t) number of downloaders (also known as leechers) in the system at time t. y(t) number of seeds in the system at time t. λ the arrival rate of new requests. We assume that peers arrive according to a Poisson process. μ the uploading bandwidth of a given peer. We assume that all peers have the same uploading bandwidth. c the downloading bandwidth of a given peer. We assume that all peers have the same downloading bandwidth and c μ. θ the rate at which downloaders abort the download. γ the rate at which seeds leave the system. η indicates the effectiveness of the file sharing, which we will describe shortly. η takes values in [0, 1]. 8
Steady State Performance Important points to Note 1. T is not related to λ; 2. When η increases T decreases. 3. When γ increases T decreases. 9
Steady State Performance Important points to Note 4. The above equation says that downloaders must also upload data they have for the system to survive. 10
Experiment I Normalized Number of Seeds vs. Time 11
Experiment I Normalized Number of Downloaders vs. Time 12
Experiment II Used log events from the nodes(eg., when joining/leaving the system or download completed) Log events can also be the numerically shown by estimating the total amount of data uploaded/downloaded so far, number of bytes still needed, etc. With this information various parameters related to the BT can be estimated namely the rate of arrival of new requests, the rate at which downloaders abort the system 13
Experiment II But these log events cannot determine the whether there is an upload or a download bottleneck in the system. Can be found out by the formula : Total uploading rate / Number of peers uploading 14
Experiment II Number of Seeds vs. Time 15
Experiment II Number of Downloaders vs. Time 16
Research Paper I Lessons for the projects The calculation of the uploading bandwidth of peers in Experiment 2 is used as a base for calculating the download rate of peers in our project. 17
Research Paper II The Bittorrent P2P File-Sharing System Measurements and Analysis J.a. Pouwelse, P. Garbacki & D.h.j. Epema (2005). The Bittorrent P2P File-Sharing System Measurements and Analysis. IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, Pages 4-4. present a measurement study of BitTorrent in which we focus on three issues, viz. activity, availability, and download performance. 18
Research Paper II The BitTorrent File-Sharing System BitTorrent: A file-downloading protocol Each peer is responsible for maximizing its own download rate by contacting suitable peers, and peers with high upload rates will with high probability also be able to download with high speeds. When a peer has finished downloading a file, it may become a seed by staying online for a while and sharing the file for free. 19
Measurement Results Activity Daily Cycle the min and max number of downloads occur at roughly the same time each day 20
Measurement Results Activity Large Variation due to failures of either the mirrors themselves, the.torrent servers, Or the trackers. 21
Measurement Results Activity We conclude that the number of active users in the system is strongly influenced by the availability of the global components in BitTorrent/Suprnova. 22
Measurement Results Availability reliable webhosting of Suprnova pages is a problem. the.torrent file servers are even less reliable. 23
Measurement Results Availability Unavailability has a significant influence on popularity. High frequency of such failures as apparent There is an obvious need to decentralize the global components. 24
Measurement Results Availability Seeds with a high availability are rare. 25
Measurement Results Download performance It turns out that 90% of the peers had a download speed below 520 kbps the average download speed of 240 kbps allowed peers to fetch even large files in one day 26
Measurement Results Download performance An important observation is the power-law relation between the average download speed and the number of downloads at that speed. 27
Measurement Results Download performance the number of seeds after 10 days is not an accurate predictor for the content lifetime the files with only a single seed can still have a relatively long content lifetime 28
Measurement Results Download performance BitTorrent itself does not have incentives to seed When users do not upload sufficiently, their access is temporarily denied 29
Research Paper II Conclusions Activity Availability Download performance Contributions to Our Investigate Number of nodes/peers Fairness is good or bad Effect to the average download speed 30
Research Paper III Analyzing and Improving a BitTorrent Networks Performance Mechanisms Ashwin R. Bharambe, Cormac Herley & Venkata N. Padmanabhan (2005). Analyzing and Improving a BitTorrent Networks Performance Mechanisms. INFOCOM 2006. 25th IEEE International Conference on Computer Communications, 31
Research Paper III Problems Discussed 1. BT follows TFT policy whereby nodes preferentially upload to peers from whom they are able to download at a fast rate in return 2. Chocking and unchocking 32
Research Paper III Problems Discussed 3. How effective is BitTorrent TFT policy in ensuring that nodes cannot systematically download much more data than they upload? That is, does the system allow unfairness? 4. Altruistic uploading even after finishing their downloads. What if the nodes leave...? 33
Research Paper III Simulation model Assumptions 1. Network propagation delay is ignored 2. Endgame model of BT is ignored 3. Only bottleneck is the upload/download link speed 34
Research Paper III Metrics 1. Link utilization: avg (all download speeds) / avg(maximum download speeds). mean download time is inversely related to avg uplink utilization. 35
Research Paper III Metrics 2. Fairness : The BT TFT mechanism. Trying to look at those edge conditions where in an a node uploads more than he receives. 36
Research Paper III Metrics 3. Load on the seeds: This metric is used to look at situations where the seeds depart the system as soon as they finished downloads. For the system to be scalable, the load per seed should remain constant (or increase only slightly) as the number of leechers in the system increases. 37
Research Paper III Experimental Setups 38
Two Environments 1. Homogeneous: Each leecher has the same downlink/uplink bandwidth 2. Heterogeneous : Each leecher does not have the same downlink/uplink bandwidth. A) high-end cable (6000/3000 Kbps), high-end DSL(1500/400 Kbps), and low-end DSL (784/128 Kbps) B) Interesting behavior in an heterogeneous system where nodes "compete". 39
HOMOGENEOUS environment Experiments and Results 1. Normalized number of blocks vs. number of seeds: Not counter intuitive, works as one expects. 40
HOMOGENEOUS environment Experiments and Results 2. Normalized number = (number of blocks served) / (number of blocks in one full copy of the file). 41
HOMOGENEOUS environment Mean upload utilization vs. seed bandwidth Experiment 1.Vary the number of seeds in the system. From a single seed to multiple seeds 2.Vary the bandwidth of the each of the seeds. 42
HOMOGENEOUS environment Concurrent uploads Limit the number of connections. Two problems : 1. Division of node (leecher) upload b/w. 2. Under utilized upload pipe. 43
HETROGENEOUS environment Experiments and Results Problems in BT It follows a "rate-based" TFT policy with optimal unchocking. But still can run into problems. 44
HETROGENEOUS environment Solutions by the paper 1. Quick bandwidth Estimation (QBE) 2. Pairwise Block-Level TFT Uab Dab + Δ Uab = Numbers of blocks that A uploads to B Dab = Numbers of blocks that A downloads from B Δ = unfairness threshold on this peer-to-peer connection The maximum number of extra blocks served by a node is bounded by dδ, where d is the size of its neighborhood 45
HETROGENEOUS environment Mean Upload utilization vs. Node Degree 46
HETROGENEOUS environment Maximum normalized number of blocks vs. Node degree 47
HETROGENEOUS environment Bandwidth-matching tracker policy Mean Upload Utilization vs. Node Degree 48
HETROGENEOUS environment Bandwidth-matching tracker policy Max #Block Served vs. Node Degree 49
HETROGENEOUS environment Bandwidth-matching tracker policy Mean Download Time vs. Node Category 50
Research Paper III Summary 1.BT TFT policy is a not efficient. 2.Pairwise TFT + bandwidth matching 3.Seed bandwidth -- the most critical resource 4. Least Rarest First can be used for new peers Ideas for our project 1. Relationship of avg uplink utilization vs. mean download times 2. QBE & TFT idea for choosing peers. 51
Idea Design Node (Peer) 1. Norms Max Upload Speed (Capacity) Max Download Speed (Capacity) File List MUS: 200kb/s MDS: 500kb/s File Size A 20M B 5M D 100M G 36M M 85.8M O 12.1M 52
Idea Design Node (Peer) 2. Algorithms Upload Speed Calculate Algorithm DSn = MDSn * (MUS / (MDS1 + MDS2 + + MDSn)) MDS: 120kb/s MDS: 180kb/s 75kb/s 112.5kb/s 312.5kb/s MDS: 500kb/s MUS: 500kb/s 500kb/s 53
Idea Design Node (Peer) 2. Algorithms Download Speed Calculate Algorithm USn = MUSn * (MDS / (MUS1 + MUS2 + + MUSn)) MUS: 200kb/s MUS: 300kb/s 20kb/s 30kb/s MUS: 500kb/s 50kb/s MDS: 100kb/s 100kb/s 54
Idea Design Node (Peer) 2. Algorithms Real Situation MDS: 120kb/s MDS: 180kb/s 75kb/s B 112.5kb/s 312.5kb/s B B A B C MDS: 500kb/s MUS: 500kb/s 500kb/s 55
Idea Design Node (Peer) 2. Algorithms Real Situation MDS: 120kb/s MDS: 180kb/s 60kb/s B 90kb/s 250kb/s B B A B C MDS: 500kb/s MUS: 500kb/s 100kb/s C 125kb/s A MDS: 200kb/s C 75kb/s 300kb/s A C MUS: 300kb/s 56
Idea Design Node (Peer) 2. Algorithms Real Situation MDS: 120kb/s MDS: 180kb/s 60kb/s B 90kb/s 250kb/s B B A B C MDS: 500kb/s MUS: 500kb/s 500kb/s C 100kb/s MDS: 200kb/s 175kb/s A C 75kb/s A C MUS: 300kb/s 75kb/s 57
Idea Design Node (Peer) 2. Algorithms Real Situation MUS: 200kb/s MDS: 200kb/s 50kb/s C MDS: 120kb/s MDS: 180kb/s 30kb/s B 45kb/s 125kb/s B B A B C MDS: 500kb/s MUS: 500kb/s C 50kb/s A 200kb/s 500kb/s 200kb/s A A 200kb/s MDS: 1000kb/s A A C 250kb/s 300kb/s MUS: 300kb/s 58
Idea Design Node (Peer) 2. Algorithms Real Situation 200kb/s MUS: 200kb/s MDS: 120kb/s MDS: 180kb/s 30kb/s B 45kb/s 125kb/s B B A B C MDS: 500kb/s MUS: 500kb/s 500kb/s 50kb/s C A 200kb/s A 200kb/s A 100kb/s MDS: 200kb/s 50kb/s C MDS: 1000kb/s A A C 250kb/s 650kb/s 300kb/s MUS: 300kb/s 59
Idea Design 30kb/s 45kb/s Node (Peer) 2. Algorithms 125kb/s 500kb/s Real Situation 200kb/s 100kb/s 650kb/s 300kb/s 60
Idea Design File List 1) The number of whole file copies is 26 [A-Z] 2) Every files size will be randomly initialize before every experiment 3) Every node in the system has a file list. It is random of how many files and what the files are in each list. File Size A 20M B 5M D 100M G 36M M 85.8M O 12.1M 61
Idea Design Download Time (per file) DT: Download Time FS: File Size DS: Download Speed DT = FS / DS But download speed is not fixed, it might change every second. 62
Idea Design Time Calculate Algorithm (per node per file) 1) We make a unit time be 1 second 2) Every second we will calculate this node s download speed with the algorithm mention before 3) At this time (within 1 sec), if the speed is n kb/s, we decrease the file size by n kb. 4) We will do 2) and 3) every next sec, until the file size < 0 that means the file is completely download. 63
Simulation Design Take 1 time experiment for example 1) Initialize the whole files list. File A to Z should be different size with random calls; 2) Set up the number of nodes of the system in this time experiment; 3) Initialize all the node in the system, including Max Upload Speed (MUS), Max Download Speed (MDS) and file list of each nodes. Above all should be fixed in every condition of this time experiment. 64
Simulation Design Then begin 1 condition test (good fairness) a) We assume that every node need to get the whole copy of files list, and they are willing to download the files which haven t got yet; b) Each node will randomly choose a file which they don t have to download from other nodes. Moreover, we assume that they can successfully connect with all other nodes which have that file and download it from them; 65
Simulation Design Then begin 1 condition test (good fairness) c) When the node finishes downloading one file, we assume that it can connect with all other nodes for downloading another file of it needs within 1 sec; d) If the file that not exist in the whole system (that means it cannot be download from any other nodes), it will be remove from the whole files list. That means the nodes needn t to download it; 66
Simulation Design Then begin 1 condition test (good fairness) e) We simulate 1 condition in a fixed given time (TT), and calculate the Total Files (transferred) Size (TFS). (every sec we will calculate once and sum them up at the end) f) Call back the formula we make before, we can calculate the Total Download Speed (TDS) by: TDS = TFS / TT And the Average (download) Speed (AS) by: AS = TDS / NN 67
Simulation Design Then we get the standard condition result data We begin some different condition test (bad fairness) The main point of different condition is different percentage of nodes stop uploading in the system We will compare the result data of each different condition test (bad fairness) to the standard condition (good fairness) result data using P-value and T-test 68
Simulation Design We will do many times of experiment with different fixed parameters (those are 1) 2) 3) mentioned before) Then we will get different result data in different experiment Finally we can use those result to verify our hypothesis 69
Questions? 70
Thank you! 71