THE BITTORRENT MULTITRACKER PROJECT ================================== by John Hoffman These are the working notes for an experimental project for adding backup and redundant tracker capabilities to the BitTorrent protocol, with experimental code based on the BitTorrent reference client. PART 1: Metafile (.torrent) changes PART 2: Client changes PART 3: Tracker changes * * * PART 1: Metafile (.torrent) changes The proposed spec adds a new key to the Bencoded data in the metafile, "announce-list". This key contains a list of lists ("tiers") of tracker URLs. If the key exists, and the client supports this feature, then the standard "announce" key will be ignored. The client is to randomize each tier of trackers, then go through the tiers in order, looking for a functioning tracker. When it successfully reads data from that tracker, it stops. This structure allows for a very flexible, controllable order of trackers. Examples. announce-list = [ [ 'tracker1' ], [ 'tracker2' ], [ 'tracker3' ] ] first try tracker1, then tracker 2, then tracker 3. announce-list = [ [ 'tracker' ], [ 'backup1', 'backup2', 'backup3' ] ] first try tracker; then, if it doesn't respond, try randomly among backups 1, 2 and 3. announce-list = [ [ 'tracker1', 'tracker2', 'tracker3' ], [ 'backup' ] ] first try randomly among trackers 1, 2 and 3, then if none of them respond, try backup. * * * PART 2: Client changes * Revision 1 of the test client has the following changes: download.py was modified to read the new announce list, or to create it using the standard announce value, and to feed this value to Rerequester.py in place of the single tracker url. See: http://www.shambala.net/misc/bt-multitracker.download.py.htm Rerequester.py was modified to read the announce list instead of the single tracker url, to initially shuffle the trackers inside their tiers, and for each rerequest was modified to step through the trackers in this given order. See: http://www.shambala.net/misc/bt-multitracker.Rerequester.py.htm - - - The current change to Rerequester.py is inadequate and class Rerequester needs significant modification in order to be sufficiently robust and to properly report error conditions. The largest problem is that currently the routine that connects to the tracker(s) and reads the information (rerequest) is calling another routine asynchronously in order to process that information (postrequest). An error that crops up in the processing will therefore not be detected by rerequest, and no further trackers will be polled. This is especially egregious if the first tracker, or an early tracker, in the list is always sending bad data; the result will be that the client will be stuck on that tracker and will not be able to recover despite having the URLs of other, working trackers on its list. Another problem is that communications timeouts are handled in a separate function (announce) which calls the rerequest function. This also needs to be fixed. Lastly, there are some communications situations that rerequest does not handle gracefully, but instead causes an error in Python's operation; these need to be caught in order to improve the client's robustness. Otherwise, for instance, a tracker URL returning a 404 error page rather than regular tracker data would stop the remainder of the tracker list from being parsed. * * * PART 3: Tracker changes No change is necessary for the current tracker specification to handle backup tracker operations. This extension does open up the possibility for sharing the load of a torrent between several co-equal trackers, and in this case some changes are desirable. Without these changes, each tracker would have its own pool of clients, and these pools would be unable to share between each other. The pool of clients on one tracker might not be as well seeded as another. Therefore, it is desirable to have the trackers be able to communicate with one another to mix their pools of clients. It is also desirable to keep the statistics on the torrent consistent. A simple addition of the data reported by the trackers may be insufficient, since if clients have connected to more than one tracker, their statistics will be duplicated. This effect would be small on a cluster of stable trackers but could be jarring where communications aren't stable. * Revision 1 of the tracker changes for peer pool mixing: The trackers will talk with each other mostly as if they were peers on the torrent. A tracker will go through the metafiles in its allowed_dir, contacting every tracker (other than itself) in each metafile. (It will do so with the option "event=stopped", so its own IP is never added to the peer list.) It will harvest the client information given and cache it locally, mixing them in along with the peer IDs and addresses of the clients it is handling directly for the announces it gives to those peers. How often the trackers make requests to each other should be governed by the amount of time it takes for peer information to expire from the client. Trackers should re-request in one tenth that amount of time. For instance, if it takes 15 minutes for a peer ID from which an announce has not been received to drop off the tracker's list, then the trackers should contact each other every 1.5 minutes. Peer information received from another tracker should expire in 3 times the period where the trackers re-contact each other; in the above example, a peer ID received would expire after 4.5 minutes if it wasn't re-received. Given a response quantity of 50 peers, this will mean that each tracker will have as many as 150 other peers' data from each co-equal tracker it connects with. In addition, the smaller the number of peers in each tracker, the more of them will be represented in the data harvested; if the number is fewer than 50, then they will all be represented. So small torrents will mix completely, while large torrents will mix well enough. Also, the rapid turnover will result in lots of connections between peers on separate trackers even after that information has expired from the trackers. Duplicated peer data would need to be removed, to keep from rewarding peers which hit more than one tracker with more incoming connections. The tracker should choose a tracker ID hash randomly when the tracker is initialized and this value should probably NOT be kept in the state data. Tracker request communications should also include the keys "tracker=1", and "peer_id=". This would have 2 functions: If the ID hash received is identical to the hash the tracker itself is using, then the tracker is mistakenly contacting itself and should signal so in its response, so this error condition could be repaired. It would also identify the connection as being from a tracker, in which case two things would happen: (1) The responding tracker would return a list of peers coming only from the peer database it is handling directly, therefore not replicating effort in peers being distributed around the network of co-equal trackers and not propagating some peer IDs incessantly. (2) The responding tracker would also send a separate list of peers which had recently sent stop signals. This would help in the case of a peer which tried to send a stop but couldn't reach its primary tracker. Neither of these changes would be of any benefit to a connecting peer, so there would be no reason for a peer to counterfeit the key. Security: So long as the announce lists in the metadata files in the tracker's allowed_dir do not contain the addresses of any attackers' false trackers, there will be no way for an attacker to affect the operation of a torrent. The presence of an attacker's false tracker's address in one torrent's announce list should only affect the operation of that one torrent, unless an attack is found in generating a response that induces an error condition in the tracker; this will need to be guarded against. * Revision 1 of the tracker changes for statistics mixing: No specific changes have been considered as of yet. Changes may include code to send log data out via the internet to a new application designed to aggregate the data from multiple trackers.