What
is bittorrent? Bittorrent is a popular p2p file sharing protocol that is typically used to transfer large files. It tries to effectively make use of the upload bandwidth of the downloaders to reduce the bandwidth requirements on the original file source. For more information, please visit http://www.bittorrent.com Why use a coordination protocol?The idea behind developing a coordination protocol for bittorrent is to try to improve its effectiveness as a content distribution mechanism. It is important to note that bittorrent does not provide any mechanism for content discovery perhaps suggesting the authors intent for the protocol to be used more as a distribution mechanism rather than an anonymous p2p system. Though bittorrent is currently successfully being used to distribute large files, its effectiveness can potentially be enhanced by providing mechanisms which allow the clients to trust each other and hence provide an overall better performance than each of them working on their own. The coordination protocol attempts to achieve just that. For more details, please read the questions below. Do
we really need someting like this? Is it worth the extra effort? Most bittorrent users have at some point or another come across the problem of "no seeders". This situation occurs when there are a group of users currently trying to download a file but the original file source is no longer connected to the network and the clients, among themselves, do not have all the pieces required to reconstruct the whole file. This situation typically occurs if the file source is on a slow or inconsistent connection or if he has manually disconnected from the network due to bandwidth constraints etc. In this situation the people currently downloading the file can only hope to complete the download if someone who has the complete file joins the network to make the missing pieces available. If this does not happen, the bandwidth used by the clients to download the partial file goes to waste. The coordination protocol can significantly reduce the probability of this situation arising especially if the clients involved have an external trust relationship. For more details, please read the questions below. If
the problem of "no seeders" is so prevalent, why hasn't anyone
done There are some systems in place currently which try to overcome this problem. However they also bring some new issues. The most popular solution to this problem is the use of "private trackers". If you do not know what a tracker is, you can find more information about bittorrent and its system at http://www.bittorrent.com. A private tracker works by requiring all users to register with the tracker before using its services. All users need to be authenticated and logged into their account before the tracker responds with a list of peers for a particular torrent file. This provides some accountability for each user and thus users can avoid downloading files from sources who have a history of disconnecting from the network prematurely. It also provides a medium for users to contact others who have previously downloaded the file but are currently not connected to the network. Thus the user with the complete file can be requested to rejoin the network for some time to make the missing pieces available. That was the good news, now lets look at its drawbacks. By its very nature, the tracker software requires all users to be logged in before they are allowed to use the tracker services. This means that the original file source must also be registered with the same tracker site. This limits the origin of the file to the group of users registered with the particular site. A user who wishes to download a file from another site will have to first register with them before he is allowed access to their tracker. Since there are hundreds of private tracker sites available on the internet with more being added each day, this is not very convenient. So
what does the coordination protocol do? The coordination protocol attempts to improve the effectiveness and robustness of the bittorrent protocol by trying to ensure that a coordinated group becomes "self sufficient" in the minimum amount of time. By self sufficient I mean that the group has all the pieces required to reconstruct the complete file. It essentially advices each client on which parts of the file to download trying to ensure that the clients in a coordinated group always download mutually exclusive parts of the file. This equates to the clients pooling their bandwidth and thus all the pieces of the file will be available within the group in the minimum amount of time. Once the group reaches self sufficiency, it dosen't matter even if the other sources of the file disappear completely as the clients can share the pieces among themselves and reconstruct the complete file. How does it work? Give me an exampleWhen a user initiates a coordination task, the coordination server reads the torrent file and divides the pieces of the file into ranges. A coordination task can only support a fixed maximum number of users which is specified during initiation and this number is equal to the number of piece ranges created for the task. These piece ranges can be in one of three states at any given time: •
Unavailable: Indicating
that no user in the group is currently trying to download pieces in this
range. When a client requests for a piece range, the server assigns the ranges with priority for unavailable ranges. If no such ranges are available, it assigns a range in downloading state on the basis of the number of users downloading the range. Finally if all the pieces are in completed state(ie the group is self sufficient), the server informs the client that it does not need to take part in the coordination process. The server also handles users quitting the coordination task making previously available pieces unavialable. For a more detailed explanation on the server behaviour, please read my paper on the subject here. To give an example of how the system is useful, consider this scenario. Consider three groups of clients. The first group has a single client on a 64Kbps pipe, the second is a group of 16 64Kbps clients and the third is a single client with a 1Mbps pipe. Let us assume that these three groups are involved in downloading a file with 16 pieces using bittorrent. Let us also assume for simplicity that all the clients involved in the bittorrent transfer treat each other equally and upload to every other client without prejudice. The download process may now continue as shown below:
Now we can draw two major conclusions from the above scenarios: •The time it takes for the second group of 16 64Kbps clients to obtain all the pieces of the file among themselves is always less than or equal to the time it takes for the first group of a single 64Kbps client to download the complete file. •The time it takes for the second group to obtain all the pieces of the file is always greater than or equal to the time it takes for the third group of a single 1Mbps client to download the complete file. It is important to note that the above two conclusions indicate the best and worst case scenario for the second group in terms of coordination of the download in the group. The best case scenario occurs when all the 16 clients download mutually exclusive pieces so that their combined bandwidth enables them to download the file at the speed of a single 1Mbps client. The worst case scenario is where all the 16 clients download the same pieces in lock step essentially equating the performance of the group to a single client with a 64Kbps pipe. Though trying to achieve the best case scenario in the real world is impractical, the coordination process attempts to lift the overall performance of the group closer to the best case scenario. Ok, so the system could be useful. But what are its drawbacks? Will mydownload speeds suffer if I use this system? Though the primary goal of using the system is to provide more resilience to the bittorrent system and avoid the problem of "no seeders", care has been taken to ensure that this does not affect the performance. Your download speeds will not be hampered because the priority mechanism we are using ensures that your download always proceed at their maximum potential. For more details about the priority mechanism, please read my paper here. The only drawback of using the system is the overhead in the communication betweed the coordination tracker and the clients. However since these are simple text messages in XML, it is insignificant compared to the actual data transfer required to download the file. Does the system improve the performance of bittorrent?Though the primary goal of the system is to provide more resilience to the bittorrent protocol, some performance advantage can be achieved mainly because of the way the bittorrent system works. The download speeds you get in your bittorrent client not only depend on your download bandwidth but also on the upload bandwidth of the peers you are connected to. Most people restict the upload bandwidth for their bittorrent transfers so that it does not compromise any other transfers they may be running on their computers. Thus even if you have a very fast connection, you may not get good speeds because of the limitations at the other end. In such cases, your speeds tend to improve if you are connected to more number of clients as you can potentially download at the combined upload bandwidth of those clients. However, these advantages can obviously only be felt if the new peer you connect to has pieces of the file that you are interested in. In the regular bittorrent system, since there is no coordination between the peers, each client selects the pieces it wishes to download using the rarest first strategy based only on the local information available to it(ie the clients it is connected to. If you do not know how this works, please visit http://www.bittorrent.com). This could lead to several clients having the same pieces and thus all the pieces that you are interested in may be present with fewer peers. Since the coordination protocol breaks down the file into piece ranges, each client in a coordinated group ends up acting as a source for the piece range it has successfully downloaded. Thus when a coordinated group is self sufficient, the clients in the group act as what I term a "Virtual Seeder". Please see the diagram below to get a clearer picture.
Since the upload potential of the virtual seeder is the combined upload bandwidths of all the clients, they are generally better that a single seeder with limited upload bandwidth. I am behind a NAT. Will I have any problems coordinating with otherpeople behind NATs? / Isnt the system a little too simple? Is it really going to work? First of all, you will have no problem with NATs simply because the coordination server is a web server with a public IP address. So you can create a coordinated group with your other friends behind NATs and the system will work fine. The reason why I have tried to keep the system simple is because the coordination system is essentially a means to an end. Running the coordination code on a webserver also means that the system should require minimum cpu time as the coordination tracker may be tracking hundreds of tasks at any given time. I decided on this design after much consideration about the advantages and drawbacks of various designs. The primary motivating factor behind running the coordination code on a server rather than one of the clients themselves is to solve the NAT problem. I found that running a more elaborate system of coordination on the client system had minimalistic advantages in any case. Currently the Coordination Tracker can be run on any shared hosting server with php support. Though the server currently uses mysql, this is only for elementary data and it can easily be modified to store the data in files. If you have any suggestions or feel a different method may be more appropriate, feel free to email me or post a message on our forums at http://www.voidbots.net/forums. Since the coordination code is anyway run on a web server, why not run iton the tracker itself. Why do you want to use a seperate server? There are a couple of reasons for doing this. Firstly a tracker keeps track of ALL the users trying to download a file. Depending on the size of the file, it is only meaningful to coodinate a transfer with users in a suitable range. For example, it may be useful to coordinate a transfer of a 1GB file with about 20 users. However using 20 users to coordinate a 10MB transfer would be meaningless. Even if we decide to restrict the maximum number of users that can join the coordination process, it would still leave us with a since coordination group per torrent. With the current system, there can be several coordination groups in a single swarm as shown below.
Though it is possible to design some mechanism allowing users to create different coordination tasks for the same torrent on the same tracker server, it would still put all the load on a single server and provide no mechanism to distribute it. However, it you have a new idea which you feel will work better than the current mechanism , I would love to hear about it. Please post your suggestions on our forums at http://www.voidbots.net/forums. I am interested in trying out the system. Where can I get the client andserver software? First of all I would like to apologize for the lack of documentation for the client and server. I will work on it as soon as I have the time. I have currently developed two clients, one for windows and the other for linux. The linux client is a command line client which uses ncurses. You can download the linux client at http://vsagesoftware.com/vct/vct.zip . If you do not have the libboost multithreading runtime libs you can get it at http://vsagesoftware.com/vct/libboost_thread.so . The windows client is a modification of the arctic bittorrent client and supports multiple simultaneous downloads. You can download the windows client at http://vsagesoftware.com/vct/vsage.exe . This is a debug build in VC++ .Net 2003 and also uses several libraries. I have put up links to all the dll files you may need at http://vsagesoftware.com/vct/dlls If you still cant get the program to run or are having any other problems, feel free to email me or post a message on our forums at http://www.voidbots.net/forums. Now all you need to do is register with the coordination server at http://www.voidbots.net. Once you are logged in, click on the "New Task" link if you wish to create a new coordination task. Follow the instructions on the site to initiate a new task. If you wish to join an existing task, click on the "Browse Tasks" link on the main page and select the task you wish to join. Click on the "Join this coordination task" link and proceed to join the task. If you have any problems using the system, email me or post your problem on the forums. I am interested in this project. How can I contribute?If you are proficient in C, C++ or PHP and would like to code for this project, please mail me with your details. You can also help me by providing test clients/bandwidth to try out the system. Essentially the best way to help me would be by using this system in all your bittorrent downloads since the coordination system will not hamper your download speeds in any case. I have a query which has not been answered on this page. Should I emailyou my questions? You can definately email me to get your questions answered and I will try to reply as soon as possible. However, it would be more helpful for me if you post them on our forums at http://www.voidbots.net/forums as this will also help others who have the same queries and reduce my work of having to answer each question indivisually.
|