Combining Coding and Block Schemes for P2P Transmission

Size: px

Start display at page:

Download "Combining Coding and Block Schemes for P2P Transmission"

Maryann Stevenson
5 years ago
Views:

1 Combining Coding and Block Schemes for P2P Transmission Ben Dodson Dr. Zach Ives Doug Petkanics Dr. Sanjeev Khanna April 14, 2006 Abstract We introduce a new method for the distribution of large files in a peer to peer network based upon a combination of rateless codes, network coding, and block transmission. Each of these schemes can contribute in a different way towards the collaborative transfer of a file. We believe a combination of the above schemes will improve the download rate, as each scheme creates a unique bottleneck at some point during the transmission. By using randomization and rateless codes to generate encodings of information during a file transfer, the receiver will always receive useful information. As peers containing large amounts of information drop out of the network, there is generally a problem gaining information about certain rare pieces of a file. Due to the fact that we use randomized encodings, we attempt to eliminate this problem by ensuring that there is information to be gained about every piece of a file. We develop a protocol, perform analysis on its properties, and implement it such that we can run tests in a real network environment. The network coding and rateless codes will ensure that the file transfers will progress even if people frequently leave or join the network during transmission. By using encodings we attain a far more robust network in which rare pieces fail to exist. We modify the protocol of an existing p2p client in order to perform these measurements. 1 Current Approaches In recent years peer-to-peer file transfer has gained immense popularity. From programs such as Napster which were originally intended to share music files, to programs such as Kazaa and Gnutella which were more varied in content, users have willingly transfered increasingly large files to one another. Only recently have these collaborative methods of file transfer started to compete with the simple server-client method for many types of files, such as Linux CD images. 1

2 Today the uses for large scale file transfers over networks are more prevalent than ever, and range from software patch distribution to video on demand. In p2p environments, files were originally transfered from one user to another, but now schemes have evolved in which many users with the same file, or pieces of it, can all actively aid in the transmission to another user, such that his download speed becomes the only bottleneck. 1.1 BitTorrent: A Block Transfer Scheme Recently, the system known as BitTorrent [1] has emerged as one of the most commonly used p2p programs on the internet. BitTorrent has many advantages, such as the fact that the central server does not need to communicate with the end downloaders at all after the original handshaking process. Also, BitTorrent uses as tit-for-tat trading mechanism which specifies that the downloader can not get access to new pieces of the file he is downloading unless he is also contributing to the network as an uploader. This virtually eliminates freeriding. Another advantage is that even if the initial server goes offline, the file transfers to other users can still continue as long as one user which has the entire file continues to act as a seed to the system. BitTorrent has become so popular, in fact, that it made up over 50% of p2p traffic on the internet in June 2004 [2]. Although BitTorrent works extremely well in practice, it is possible to do better. The problem with BitTorrent is that it utilizes block transfer, meaning that it breaks the file into k blocks, and transfers these individually. Thus, there are only k useful transmissions in this network. As users receive these blocks they may then transfer them on to other users per their request. Often, a set of blocks can become exceedingly rare in the network, especially when the peer group lacks seeds. These rare blocks then become the bottleneck causing the end of a file transfer to take an unnecessarily long time. BitTorrent attempts to handle this with a rarest first block transfer policy. Furthermore, BitTorrent also has an endgame mode built in to the software, specifically designed to aid in the download of the last few pieces. According to Cohen, this is needed because of the tendency for the last few blocks to be downloaded from a single hosed modem line The rarest first policy is hindered by the fact that the rareness of blocks changes over time. We introduce a method of removing the idea of a rare block. By incorporating coding techniques, there can be many more than k blocks, of which only k are required. This helps reduce the need for both the rarest first policy and the explicit endgame transfer mode. 1.2 Avalanche: A Random Coding Based Scheme A good solution to this problem is to introduce a randomized encoding scheme and transfer encoded pieces of data based upon the original file rather than transferring the original blocks themselves. In this approach, the original data is still broken down into blocks. A rateless encoding is then used to generate 2

3 as many encoding symbols as are required (a nearly infinite amount are available). A common scheme which is used to encode packets is to simply take the exclusive-or of any number of the original blocks of data. These XORed encoding symbols can then be thought of as a system of linear equations, of which the user only needs to receive enough encoding symbols to solve the system in order to recover the original file. A very powerful, and efficient scheme is the Digital Fountain [3] approach, in which it is proven that with high probability, a downloader only needs to receive k (1 + δ) encoding symbols to decode the original data of size k, where δ is small. Taking Digital Fountain one step further, a new system has been proposed called Avalanche [4] which appears to be the first use of rateless codes in a p2p setting in which a downloader receives data from many different users on the network. Avalanche uses the Digital Fountain approach combined with network coding techniques to achieve a significant improvement over BitTorrent in simulation. The fact that users can now receive encoded information, from which they can find out a little bit about multiple blocks at once, leads to much faster file transfers. Unfortunately, the network coding techniques which are used in the Avalanche system can lead to redundant data being sent to the downloader time and time again. After a while, the data which he is receiving becomes less and less unique. However, due to the fact that blocks to be included in the encoding were selected randomly, this leads to a much better chance that the specific block that a user needs will not be as rare as it was in the block transfer approach. Also, the simulations on Avalanche do not seem to realistically mimic real network conditions. 2 Our Strategy 2.1 A Compromise We feel that a compromise between the BitTorrent and the Avalanche approach leads to the best performance. We attempt to introduce encoding techniques into the BitTorrent protocol to eliminate the rare block. By using encodings rather than straight block transfer the robustness of the network is greatly increased, as the chance that information about a particular block is somewhere in the network is far greater due to the fact that it could be included in any encoding. When straight block transfer is used the individual rare block itself needs to have been disseminated from the seed, otherwise it will not exist in the network should the seed drop out. One major goal of our project is to determine how best to combine the block transmission and encoding transmission methods. There is much to consider for how this will technically be achieved; the two schemes work well under different conditions, and we seek to create peers that can determine when to use the appropriate scheme, preferably based only on local information. For example, as explored by [5], BitTorrent works exceedingly well when there are a sizable amount of seeds (here, 20 percent of the network.), but may not when 3

4 this number is far fewer. Furthermore, we can remove the need for a rarest block first policy by using an encoding scheme; there are many encodings that can give enough information about a block to decipher it. Avalanche works well when nodes are continuously receiving new information, as their encodings are generated by combining all blocks known to them. One of Avalanche s shortcomings is that it suffers from slow decoding times. We utilize the strengths of both schemes while minimizing their weaknesses. The Avalanche scheme alleviates the need for seeds in a network in the BitTorrent scheme. Restricting our field to order two rather than order 2 16 will reduce decoding times, as will transmitting some known blocks per request. A more dynamic encoding process than that of Avalanche will yield a wider variety of useful packets between peers. Furthermore, we explore the possibility of downloading these files in a heterogeneous environment of both standard BitTorrent clients and the new clients using an encoding scheme. In order to achieve our desired results, we require a handful of tools, mainly drawn from graph theory, linear algebra, and probability theory. The coding theory used is largely based on the principles of linear algebra, but we gain insight by thinking of it in the context of graph theory. Our scheme for selecting blocks for an encoding will require a combination of graph theory and randomization. We begin by motivating our scheme with analysis about randomization in encodings. This is covered in section 3.1. We then select an appropriate encoding scheme which offers fast, simple, decode times, yet still gives us all the advantages which have previously been discussed (section 3.2). We perform analysis on this scheme to show cover times for all blocks in the network, and to prove that this creates a far more robust network than BitTorrent. After the protocol has been developed and analyzed, we implement it by modifying an existing BitTorrent client. Our implementation is described in detail in section 3.3. The client we chose was the original BitTorrent implementation. By modifying the existing client, we have a working program which could easily be used in practice. We run tests using our modified client, and report our results in section 4. 3 A Hybrid P2P Scheme As planned, first semester was spent motivating a hybrid p2p scheme and performing research and analysis. Basic theory has been worked out to show that there is indeed a benefit for implementing this new protocol, and that improvements can be made to both the BitTorrent and the Avalanche protocol. We have developed an implementation of this new protocol by modifying an existing BitTorrent client, and have run simulations and tests to verify that we achieve performance gain. 4

5 3.1 Motivation and Analysis We now consider why our proposed transmission scheme will be beneficial in a p2p environment. The intuition behind it is as follows: with a simple block transmission scheme over n blocks, a user completes the download precisely when these n distinct symbols have been downloaded. With a coding scheme, the size of our universe of available blocks expands, call it N, and only a subset of encodings (optimally still n) is required. It is known that BitTorrent s block transmission protocol functions well in a stable environment in which many users possessing the entire file remain available for future downloaders. [5] We consider an environment in which we expect BitTorrent to not function optimally. We consider a network in which nodes are frequently being added and removed. In the worst case, we add and remove nodes in such a way that the BitTorrent method for handling rare blocks fails. We then inject randomly selected blocks from a single seed into this network. The cover time is of interest to us; how long do we expect to wait before the network contains all n blocks? Assume there are k distinct blocks in the network. Define a 0-1 random variable X k to be 1 when we receive a distinct block in the network, given k distinct blocks are already present. Define T k to be the expected wait time for a distinct block given k exist in the network. We have: (n k) P r[x k ] = n E[T k ] = n n k Thus: n 1 T = T i n 1 E[T ] = E[T i ] = n 1 n n k n 1 1 E[T ] = n n k = O(nlog(n)) Now, consider a scheme using encodings. With an XOR encoding scheme, a single encoding is the XOR of d symbols chosen from our original n. We allow all possible encodings for a universe size of 2 n 1. An encoding is now useful if it is not the linear combination of any of our previous k blocks. Thus: ) ( k i P r[x k ] = 1 k n ( n i ) = 1 2k 1 2 n 1 E[T k ] = ( ) 1 1 2k 1 2 n = 2n n 2 k 5

6 Thus: n 1 E[T ] = E[T i ] = n 1 n 1 T = T i 2 n ( 1 2 n 2 n 2 i n 1 2 n 2 n 1 ) ( 2 n ) 1 = n 2n Hence, with this block encoding scheme, we have a linear cover time. Now, we examine a new encoding scheme. Here, a user receives packets in such a way that the degree (deg) of the encoding is a function of the number of encodings the user has already received (k). Here, our expected cover time is as follows: (n k) P r[x k ] = n E[T k ] = n n k Letting k = pn gives us: Now, let deg(k) = E[T k ] = 1 1 p b 1 p for a small constant b. Then E[T k] = O(1), and thus: n 1 T = T i = O(n) In the collaborative download environment, we would not be able to allow for all encodings in the network because of the need for data verification via hashcodes. Rather, we would like to predefine c n encodings, for a reasonable constant c, that will be transmittable. Furthermore, we require efficient, onthe-fly decodings of our encoded blocks. These two restrictions lead us to the Twisted Tree encoding scheme. 3.2 Twisted-Tree Encoding Scheme While the Digital Fountains encoding scheme is ideal for a single-server, manyclient transmission model, the requirements are too restrictive for a collaborative download environment like BitTorrent. For example, we cannot have any guarantee on the randomness of transmitted blocks, since a client will certainly have a large degree of redundant data available to its peers. Furthermore, we would like to be able to perform our encoding and decoding on the fly to maximize the availability of blocks that can be distributed between peers at any given time. Lastly, we can only allow for certain encodings in our network, since the data must be verifiable by hashcodes sent before the file is downloaded. Thus, we introduce the Twisted Tree encoding scheme. Consider a file split into n blocks. We begin by constructing an encoded block in which all n pieces are XOR-ed together. This will be the root of our 2 n 2 6

7 encoding tree. For a given node of degree k, we recursively construct a binary tree by encoding the leftmost k 2 blocks in the left child, and the remaining k 2 blocks in the right child. We continue until we have the leaves of our tree, corresponding to our original n blocks. In total, there are no more than 2n nodes in our tree, of which n are our basic blocks. This encoding scheme thus restricts the size of the universe of allowable encodings, as we require for the data verification in our download environment. Furthermore, encoding and decoding can be done on-the-fly based only on the local properties of a node in our tree. The runtime of a single decode is no worse than O(log(n)). However this scheme alone does not give much improvement to our random cover time. But we can construct multiple trees in a similar way to achieve our O(n) cover time. Now, we randomly permute our original blocks before constructing our tree. We now have T trees of equivalent structure, but with mostly distinct internal nodes. The leaves are equivalent in all trees, and thus their hash values need only be computed and transmitted once. For T trees, we now have no more than (T +1)n allowable encodings and decode times of O(T n). Note that an internal node of equivalent encoding can appear in multiple trees, as is always the case of our root nodes, but this lookup can be made constant and so our decode time is not affected. In general, we may receive more than n useful blocks before our file can be decoded. That is, our encoding scheme is not rateless when treated as an erasure code in which purely random blocks are received. However, for cases where T 3, this overhead is less than 1% of the overall file size. In our experiments, the worst we saw was 10% of overhead (189 blocks more than the required 2048). 3.3 Implementation In considering the different approaches to take towards an implementation, there were a number of options which we had to choose from. One choice was to implement a simple version of the BitTorrent protocol, implement a simple version of Avalanche, and then implement our protocol to combine the two approaches. We could then run the three protocols in a simulated environment under the same conditions and vary a number of different factors including number of seeds, resiliency, nodes leaving and entering the network at random, varying upload speeds and download speeds, amongst other variables. There are many drawbacks to this approach, the most important being that our simple implementations would not model the complex protocol aspects that BitTorrent has taken on recently, and therefore would not accurately gage real world BitTorrent performance. Other drawbacks included the fact that in a simulated environment it is very difficult to model real life users. Another choice to consider was to modify a BitTorrent client to work in a simulated environment. In this case, we would create clients which would not actually communicate with one another, but instead would communicate with a central server which would keep track of all the simulation details and results. 7

8 Figure 1: A Twisted-Tree encoding with T = 2. In blue, blocks that have not yet been decoded. In green, pieces previously known to the peer. In yellow, a block that has just been received by the peer. In orange, pieces that can now be decoded. 8

9 While these clients would still implement the protocol which we defined, they would only work in simulation and would not transfer real data amongst one another. The advantages of this approach would be that we could run various simulations, modeling different network factors. This would produce the most valuable theoretical data, however it would not be at all useful in the real world. Also, we decided that the data produced may not hold much weight due to the fact that BitTorrent already works very well in practice, and if we model network behavior incorrectly, then we would getting contrived results that may not accurately reflect real world performance. A third choice considered was to take an existing BitTorrent client and modify it such that it would implement our file transfer protocol, yet also be backwards compatible with the existing BitTorrent protocol. This approach would give us the most practical application with the best chance of spreading quickly throughout the p2p community. Should this approach give us a more robust network and achieve a gain in performance, then there is no reason that end users would not be willing to adapt this protocol very quickly, and then see a speedup in their file transfer rates. The main drawback of this approach is that it is difficult to run simulations because it is hard to set up a distributed network consisting of many different clients all running with different properties. This is a trade off for real world usefulness. Originally we thought it would be best to chose the simulation based approach so that we could vary different factors of the network easily and measure when our protocol would be most effective. It would then be possible to run file transfers during which the number of peers, number of seeds, and peer volatility all varied. We soon realized however that since BitTorrent worked very well when there were a large number of seeds, we needed to attack the situation in which it did not work well. This was the case when there were few seeds, a large number of leeching peers, and certain blocks were rare in the network. In order to attack this problem we chose to use the third option, which was to take an existing BitTorrent protocol and modify it to actually work in the real world using our protocol. Therefore we gain the advantage of having a real world implementation, plus it s easy to simulate a network in which there are limited seeds but multiple peers. At the end of last semester, our plan was to modify the BitTorrent implementation called Azureus. We chose this for a number of reasons including the fact that it was open source, it was written in Java (a language which we were both very familiar and comfortable with), and it provided a great front end which included performance analysis and measurements. After working with Azureus for a couple of weeks, we decided that it was too bulky and extremely complicated. Much of the protocol functionality was obscured by large amounts of front end code, and working with it was more effort than it was worth. Instead we decided to settle on modifying the original, and much more widely used, BitTorrent client. Originally we avoided it because it was written in Python, which none of us had experience with, but after spending a few days with Python we found this implementation to be much more lightweight, efficient, and easier to work with. The protocol was well defined and easily observed 9

10 in the code, and it was not long before we were comfortable making changes without breaking anything. The first, and most important step in our implementation was to write an encoding/decoding class. The bulk of our protocol (section 3.2) would be implemented here. This class sits completely outside of the current BitTorrent code, because the current implementation had nothing to do with encodings and decodings. We defined a fairly straightforward interface for this class so that the original code could request random encodings and request the hash values of these new encodings which would later be used for data verification. The specific encoding trees themselves would be generated here as well based solely on the number of original blocks and number of trees desired. When a client receives a new encoding from one of its peers, it also passes the encoding into this class, and the class then computes the closure of what can be decoded based upon this new information. Sometimes nothing new can be decoded, but other times there is a trickle effect down the tree and multiple pieces of data can then be recovered. Furthermore, when a peer notifies another peer of a piece that has been recently obtained, the second peer can compute the closure to know all pieces that are available to that user while keeping overhead low. After the interface to this class had been defined and implemented, it was necessary to modify the rest of the BitTorrent code in order to operate on encodings rather than just straight blocks. There are very few places in the code where the data received is actually handled. It basically just gets stored until the end of the transfer at which point it is pieced back together. However, there are many places where things are kept track of, and the code stores which pieces it has, and which pieces its peers have. This is used to make requests and to answer requests from peers. All of this code needed to be modified such that each client could keep track of what encodings they had (and could generate based on other information), and which encodings its peers had available. Now, when requests were made, they could be made for a single piece, or for an encoding. The protocol itself did not need to be modified. Instead, we defined an ordering for the new encoded blocks, determined by a list of permutations of basic blocks within the.torrent file. Notifications and piece requests are thus carried out as before, with indices beyond the original number of blocks. In order to support this, changes had to be made to many different sections of the code. One of the first stages of the BitTorrent process, is that a.torrent file needs to be created at the tracker (or machine responsible for distributing the list of peers and metadata about a file transfer). Every peer in the network needs to have the same.torrent file which contains info about the file being transfered, including hash values for each piece so that peers can verify that they received correct data. We modified the creation of this.torrent file so that it now included hash values for all allowable encodings. This is the largest overhead that we have over the original protocol because a.torrent file is now about T + 1 times as large as the original.torrent file, where T is the number of trees used, a good value for T being between 2 and 4. This is acceptable, however, because the.torrent file is small to begin with. Other sections of 10

11 the code which we had to make changes in included the section of the code responsible for picking pieces to request, determining how to respond to piece requests, storing received data, and piecing together data upon completion. There are multiple versions of the client which can be run, which all implement our code. There is a GUI which runs on the X-Windows system, a console version, and a console-curses version. We modified the output which is generated on the curses version in order to give a display of the piece by piece transfer progress so that we can accurately monitor what is going on with the decoding process in real time. Currently we output all blocks, and the ones which have been received at the current peer are marked. When an encoding is received, the closure of what can be decoded is computed, and anything that gets decoded also gets marked. The percentage of the original file which has been recovered is then updated on the curses-display. Sometimes when an encoding is received no new original data is recovered, but when there is a trickle down decoding effect, we sometimes recover multiple pieces after getting only one new encoding. As the implementation currently stands, we are satisfied, as it works well on a real network and effectively transfers files from peer to peer. The encoding class seemingly works very well, though due to the well defined interface if any changes needed to be made, or if we wanted to test a new scheme, it would not be difficult to do. As always, there were a few obstacles with the implementation which we had to overcome, and they are addressed in section Obstacles In trying to design a working protocol there were multiple obstacles which we came upon. We realized fairly early on that data verification would be a significant problem if we were to allow any random encoding to be used. BitTorrent only transfers original blocks of the file, so in that scheme any piece of data can be verified with a SHA1 hash and the receiver will know right away whether he received a valid block. In the Digital Fountain scheme in one-to-many transmissions, the sender is assumed to be trusted, so the data verification is not necessary. In our scheme however, if we allow any random encoding to be transfered from anybody in a peer group, to anybody else in a peer group, then the receiver would need to have access to 2 n hash values in order to be sure that what he received was a valid piece of data and would not corrupt the file. In order to solve this problem, we have decided to limit the number of valid encodings to a constant times the number of blocks. The seed will determine these cn allowable encodings upon setting up a tracker based upon the number of trees being used and the random permutations of the original tree, and he will generate hash codes for all cn encodings. This will increase the initial handshaking process between the tracker and a peer by a factor of c, but will also increase the amount of useful data that a peer can receive. After a peer receives a new encoding, we need to be able to quickly determine whether or not this is a useful piece of information. Has this peer already recovered all the information that this new block contains? If it turns out that 11

12 this block is not useful, we want the user to be able to discard it quickly without wasting time trying to decode information from it. The other large obstacle which we were fighting with after first semester was the idea of recoding. We would like to allow intermediate nodes to recode new symbols on the fly based upon blocks which they were able to recover during intermediate steps in the file transfer. This will introduce new nodes into the network thereby making it far more robust. While we decided not to allow random recoding because these new encodings would not correspond to a node in our decoding tree, do we allow peers to create encodings from the tree that they did not actually receive the original data for. This has a positive effect in a network where there are very few seeds and peers may have disparate data. In this case, they can then generate encodings which previously were not floating around the network. The technical obstacles which we had to overcome included learning Python, learning a new (rather large) system, and of course debugging. While the code which was already in place was fairly easy to follow in the overall sense, the details were often very hard to grasp due to the lack of explicit type declarations in Python and lack of comments. While running simulations and debugging, it was also difficult to know whether things were sometimes failing due to our protocol, or due to network issues which were handled in completely different aspects of the BitTorrent code which we never had to touch. 4 Results We largely focused on the idea of the random cover time as a criteria for a successful encoding scheme. Given a heavily dynamic network in which peers are coming and going at random while few seeds remain in the network, we would like to maintain a good spread of useful information in the network as a whole, thus creating a robust environment despite unstable conditions. The random cover time approximates this by focusing on a single peer and seeing how well that peer can piece together the original file given completely random information. For our experiments, we simulated the effects of a single client blindly requesting random information from the network. We used a constant number of n = 2048 blocks throughout our tests, which would correspond to a BitTorrent download of a 512 MB file, using a block size of 256 KB. The peer and the network were not allowed to communicate which pieces one another could decode at a given time. We ran 10 simulations for values of T ranging from 0 to 8. There were two main criteria of interest for us. The first was the number of useful, or nonredundant, blocks that remain in the network at a given time. Here, time is measured by the number of useful blocks the client has received. For the case where T = 0, this is a decreasing, linear function with slope -1. The second and most important is the number of redundant blocks the network has transmitted to the client. There is a clear relationship between these variables, since the 12

13 more useful blocks exist in the network, the more likely we are to retrieve one. Figure 2(a) depicts a single simulated run for the unencoded case. In figure 3(a), we depict the number of useful blocks that remain in the network when we require only one more useful encoding. However, since having T trees adds O(T n) allowable encodings to the network, we scale the result by T so that it becomes a fraction out of n. Each datapoint is an average over 10 runs. Clearly, for T = 0, we have a constant 1, since there is only one block that will allow us to complete the file. Finally, in figure 3(b) we graph the number of redundant blocks received for various values of T. In between T = 2 and T = 3 we see that less than half of the blocks received were redundant. With respect to BitTorrent, adding robustness to the network adds some overhead as well. First, we must transmit the permutations of each tree, for T > 1 (since tree 1 can be left unpermuted). This can be represented as n integers. We must also transmit the hashcodes of these encodings, each of which is 20 bytes. There are O(T n) such hashcodes. Finally, we add O(log(T )) bits to the index during a piece request or a have notification. A value of T = 2 or T = 3 represents a good trade of with the required overhead and the added network robustness. Furthermore, with this relatively low number of trees, the possible overhead due to an increasing rate of the encoding scheme remains very low, and the improvements to the network s robustness make the trade off worthwhile. Moreover, by a more clever inspection of the tree, it is possible to determine what blocks can be used to decode our basic blocks, thus avoiding this overhead all together, at the cost of a slower decode time. 5 Future Direction While this was merely a year long project, and it is now complete, there are a couple directions that this project could go if it were to be continued. The most interesting area of research would be to look into different techniques for choosing how the piece picker decides which piece to request from a peer. Currently, we use randomization to select an encoding from a list of possible encodings. The peer then decides immediately if this random encoding provides useful information, and if it does, it then requests this. There are other interesting things that could be done here, such as closure analysis to determine which piece would be most useful to you at that given time. If any peer of yours has this piece then it would be most beneficial to request it as soon as possible. Another goal which could be accomplished in the near future would be to make this program 100% interoperable with existing BitTorrent clients. Using our implementation with a current implementation would not yield any performance gain for the moment, so we neglected to spend time on interoperability as it was not what we were researching. Nonetheless, interoperability would be a very important feature for real world adoption of our program. Minimal changes would have to be made in order to interact with the original protocol, and this could probably be accomplished with a few hours worth of programming to en- 13

14 (a) Cover time for the unencoded case. Random pieces are requested until a file of 2048 blocks can be reconstructed. In pink, the number of useful (nonredundant) blocks in the universe of allowable transmissions. In blue, the number of redundant blocks transmitted, as a function of the number of nonredundant blocks received. (b) Cover time for the 2-Tree encoding scheme (T = 2). (c) T = 5 (d) T = 8 Figure 2: Cover time samplings for various values of T 14

15 (a) The fraction (out of n = 2048) of useful blocks remaining in the network preceding the final block transmission. An average was taken over 10 trials for each value of T. (b) The average number of redundant blocks received while requesting random blocks of a file with 2048 total blocks. 10 trials were used for various values of T. Figure 3: Trends over varying values of T 15

16 Figure 4: The rate of the Twisted Tree encoding scheme with various values of T. The average was taken over 10 trials. Here, the rate is measured by the percentage of non-redundant blocks received beyond 2048, the size of the file in question. sure that a peer running our implementation never requested an encoded piece from a peer running the original implementation. 6 Conclusion When we began we knew that there was room for improvement in the current peer to peer protocols, but we were not certain which aspect of performance would be best to tackle. After the first semester of research and analysis it became evident that robustness in a network in which many peers are coming and going, and there are not many seeds which contain the entire file, would be the best area to improve on. Our system certainly improves on this from the current BitTorrent protocol by ensuring that information about every block in a file is dispersed through the network as quickly as possible. Given more time and more resources to test, we would have liked to run in depth tests consisting of many peers, but we feel that our project was successful. We completed most of the work which we set out to accomplish in September, stuck with our timeline, and are happy with our implementation. References [1] Bram Cohen. Incentives build robustness in bittorrent, [2] Pouwelse Garbacki Epema. The bittorrent p2p file-sharing system: Measurements and analysis. 16

17 [3] John W. Byers, Michael Luby, Michael Mitzenmacher, and Ashutosh Rege. A digital fountain approach to reliable distribution of bulk data. In SIG- COMM, pages 56 67, [4] Content Distribution Ieee. Network coding for large scale. [5] M. Izal, G. Urvoy-Keller, E. Biersack, P. Felber, A. Hamra, and L. Garces- Erice. Dissecting bittorrent: Five months in a torrent s lifetime,

BitTorrent. Masood Khosroshahy. July Tech. Report. Copyright 2009 Masood Khosroshahy, All rights reserved.

BitTorrent. Masood Khosroshahy. July Tech. Report. Copyright 2009 Masood Khosroshahy, All rights reserved. BitTorrent Masood Khosroshahy July 2009 Tech. Report Copyright 2009 Masood Khosroshahy, All rights reserved. www.masoodkh.com Contents Contents 1 Basic Concepts 1 2 Mechanics 3 2.1 Protocols: Tracker and