Bit Torrent Protocol Seminar Report 2011

Introduction
1.

Overview

BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts of data. BitTorrent is one of the most common protocols for transferring large files. Its main usage is for the transfer of large sized files. It makes transfer of such files easier by implementing a different approach. A user can obtain multiple files simultaneously without any considerable loss of the transfer rate. It is said to be a lot better than the conventional file transfer methods because of a different principle that is followed by this protocol. It also evens out the way a file is shared by allowing a user not just to obtain it but also to share it with others. This is what has made a big difference between this and the conventional file transfer methods. It makes a user to share the file he is obtaining so that the other users who are trying to obtain the same file would find it easier and also in turn making these users to involve themselves in the file sharing process. Thus the larger the number of users the more is the demand and more easily a file can be transferred between them. BitTorrent protocol has been built on a technology which makes it possible to distribute large amounts of data without the need of a high capacity server, and expensive bandwidth. This is the most striking feature of this file transfer protocol. The transferring of files will never depend on a single source which is supposed the original copy of the file but instead the load will be distributed across a number of such sources. Here not just the sources are responsible for file transfer but also the clients or users who want to obtain the file are involved in this process. This makes the load get distributed evenly across the users and thus making the main source partially free from this

1

Bit Torrent Protocol Seminar Report 2011

process which will reduce the network traffic imposed on it. Because of this, BitTorrent has become one of the most popular file transfer mechanisms in today’s world. Though the mechanism itself is not as simple as an ordinary file transfer protocol, it has gained its popularity because of the sharing policy that it imposes on its users. This fact is quite obvious, since the recent surveys made by various organizations show that 35% of the overall internet traffic is because of BitTorrent. This shows that the amount of files that are being transferred and shared by users through BitTorrent is very huge.

1.1

History

BitTorrent was created by a programmer named Bram Cohen. After inventing this new technology he said, "I decided I finally wanted to work on a project that people would actually use, would actually work and would actually be fun". Before this was invented, there were other techniques for file sharing but they were not utilizing the bandwidth effectively. The bandwidth had become a bottleneck in such methods. Even other peer to peer file sharing systems like Napster and Kazaa had the capability of sharing files by making the users involve in the sharing process, but they required only a subset of users to share the files not all. This meant that most of the users can simply download the files without being needed to upload. So this again put a lot of network load on the original sources and on small number of users. This led to inefficient usage of bandwidth of the remaining users. This was the main intention behind Cohen’s invention, i.e., to make the maximum utilization of all the users’ bandwidth who are involved in the sharing of files. By doing so, every person who wants to download a file had to contribute towards the uploading process also. This new and novel concept of Cohen gave birth to a new peer to peer file sharing protocol called BitTorrent. Cohen invented this protocol in April 2001.

2

Bit Torrent Protocol Seminar Report 2011

The first usable version of BitTorrent appeared in October 2002, but the system needed a lot of fine-tuning. BitTorrent really started to take off in early 2003 when it was used to distribute a new version of Linux and fans of Japanese anime started relying on it to share cartoons. The most important part of this protocol that matters a lot about this is that it makes it possible for people with limited bandwidth to supply very popular files. This means that if you are a small software developer you can put up a package, and if it turns out that millions of people want it, they can get it from each other in an automated way. Thus the bandwidth which used to be a bottleneck in previous systems no longer poses a problem.

3

Bit Torrent Protocol Seminar Report 2011

2.

Bit Torrent and Other approaches

2.1

Other P2P Methods

The most common method by which files are transferred on the Internet is the client-server model. A central server sends the entire file to each client that requests it, this is how both http and ftp work. The clients only speak to the server, and never to each other. The main advantages of this method are that it's simple to set up, and the files are usually always available since the servers tend to be dedicated to the task of serving, and are always on and connected to the Internet. However, this model has a significant problem with files that are large or very popular, or both. Namely, it takes a great deal of bandwidth and server resources to distribute such a file, since the server must transmit the entire file to each client. Perhaps you may have tried to download a demo of a new game just released, or CD images of a new Linux distribution, and found that all the servers report "too many users," or there is a long queue that you have to wait through. The concept of mirrors partially addresses this shortcoming by distributing the load across multiple servers. But it requires a lot of coordination and effort to set up an efficient network of mirrors, and it's usually only feasible for the busiest of sites. Another method of transferring files has become popular recently: the peer-to-peer network, systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In most of these networks, ordinary Internet users trade files by

4

and it too is often used for illicit files of almost any nature. because there's a greater chance that a popular file will be offered by a number of peers. Hence. pirated software. Typical retention time of binary news servers are often as low as 24 hours. and having a posted file available for a week is considered a long time. Another common problem sometimes associated with these systems is the significant protocol overhead for passing search queries amongst the peers. The advantage here is that files can be shared without having access to a proper server. however the newest version of some clients allow downloading a single file from multiple sources for higher speeds. these networks tend to be very popular for illicit files such as music. and finally fanned out to the end user from there. etc. Availability is generally dependent on the goodwill of the users. a file's popularity has little to do 5 . The problem discussed above of popular downloads is somewhat mitigated. Typically. Due to the nature of NNTP. The breadth of files available tends to be fairly good. However. and because of this there is little accountability for the contents of the files. to the extent that some of these networks have tried to enforce rules or restrictions regarding send/receive ratios. a downloader receives a file from a single source. and the number of peers that one can reach is often limited as a result. one that is substantially different from the other methods.Bit Torrent Protocol Seminar Report 2011 directly connecting one-to-one. Files transferred over Usenet are often subject to miniscule windows of opportunity. movies. Use of the Usenet binary newsgroups is yet another method of file distribution. though download speeds for obscure files tend to be low. Often the end user connects to a server provided by his or her ISP. in that the messages are passed around a large web of peers from one news server to another. the Usenet model is relatively efficient. Partially downloaded files are usually not available to other peers. although some newer clients may offer this functionality. Usenet is also one of the more anonymous forms of file sharing. resulting in further bandwidth savings.

A single server can handle many such clients and serve the requested file simultaneously to all the clients. 2. The downsides of this method include a set of rules and procedures. and requires a certain amount of effort and understanding from the user. Obscure or older files tend to not be available. BitTorrent also deals well with files that are in high demand. where if the server crashes then the whole download process will seize. In this method. making it cumbersome to share a number of files. The overall download scheme will be limited to the limitations of that server. Patience is often required to get a complete file due to the nature of splitting big files into a huge number of smaller posts. The file being served will be available as one single piece. access to Usenet often must be purchased due to the extremely high volume of messages in the binary groups. Perhaps as the software matures a more suitable means of keeping torrents seeded will emerge.Bit Torrent Protocol Seminar Report 2011 with its availability and hence downloads from Usenet tend to be quite fast regardless of content. which means that if the download process stops abruptly in the middle the whole file has to be downloaded again. 6 . but currently the client is quite resource-intensive. Finally. It is best suited to newer files. Also this kind of transfer of file is subjected to single point of failure. Here the client can only depend on the lone server that is providing the file. of which a number of people have interest in. especially compared to the other methods. BitTorrent protocol has overcome all these shortcomings seen in this type and thus it is more robust due to which it is chosen by many people over this traditional method of file transfer. a HTTP server listens to the client’s requests and serves them. BitTorrent is closest to Usenet.2 A Typical HTTP File Transfer The most common type of file transfer is through a HTTP server.

This in turn effects an efficient balancing of the load among available servers across the entire World Wide 7 . such as at universities and on ISP servers. This ensures that each available mirror server is utilized to serve the users that most benefit.Bit Torrent Protocol Seminar Report 2011 Fig 2.3 The DAP method Download Accelerator Plus (DAP) is the world's most popular download accelerator. to pause and resume downloads. On the Internet the same file is often hosted on numerous mirror sites.1 : HTTP/FTP File Transfer 2. DAP's client side optimization begins to determine . This results in better utilization of the user's available bandwidth. As soon as it is triggered. DAP's key features include the ability to accelerate downloading of files in FTP and HTTP protocols.in real time . and to recover from dropped internet connections. DAPimmediately senses when a user begins downloading a file and identifies available mirror sites that host the requested file.which mirror sites offer the fastest response for the specific user's location. The file is downloaded in several segments simultaneously through multiple connections from the most responsive server(s) and reassembled at the user's PC.

4 The Bit Torrent Approach In Bit Torrent. Instead a file can be downloaded from many such users who are indeed downloading the same file. It does not require a user to download a file completely from a single server. BitTorrent has revolutionized the way files are shared between people. Once a user has some considerable number of such pieces of a file then even he can start sharing them with other users who are yet to receive those pieces. Each piece is further sub-divided into equal-sized subpieces called blocks. called the seed will initiate the download by transferring pieces of file to the users. DAP's Resume functionality and the ability to continue downloading even when one of the participating connections has dropped also provides users with a more reliable download experience. 2. A user who has the complete file. and reduces download times for users while allowing them to receive maximum benefit from their available bandwidth. All clients interested in sharing this data are grouped into a swarm. each of which is managed by a central entity called the tracker. the data to be shared is divided into many equal-sized portions called pieces. This concept enables a client not to depend on a server completely and also it reduces overall load on the server.Bit Torrent Protocol Seminar Report 2011 Web. 8 .

This is the uniqueness of this protocol. called a torrent.Bit Torrent Protocol Seminar Report 2011 Fig 2. Also this needs an implementation of a dedicated server called tracker to handle the peers connected in the network. Thus once a client has downloaded and verified all pieces. that contains the location of the tracker along with a hash of each piece. he can be confident that he has the complete data. This is not the case in BitTorrent 9 . In BitTorrent the users participate actively in sharing files along with servers. Accordingly. But BitTorrent has many such features that DAP doesn’t. Also the files are divided into pieces in both approaches. clients also send data that they have previously downloaded to other clients. If these servers are flooded with requests then the breakdown and the transaction will terminate. which has made it the most popular one. Clients download blocks from other (randomly chosen) clients who claim they have the corresponding data. Once a client receives all the blocks for a given piece. he can verify the hash of that piece against the provided hash in the torrent.2 : BitTorrent File Transfer Each client independently sends a file. Clients keep each other updated on the status of their download. The file transfer in DAP takes place through the traditional HTTP or FTP protocol which means that the transfer rate will always be limited by the server’s bandwidth. Both BitTorrent and DAP download files from multiple sources.

Cohen’s vision of peers simultaneously helping each other by uploading and downloading has been realized by the BitTorrent system.Bit Torrent Protocol Seminar Report 2011 since the whole process is not depending on servers alone. 3. Working of BitTorrent As previously explained. Pareto efficiency is an important economic concept that maximizes resource allocation among peers to their mutual advantage. BitTorrent’s is based on a “tit for tat” reciprocity agreement between users that ultimately results in pareto efficiency. This makes BitTorrent far better than its competing peers like DAP and others. The load is distributed across the network between peers and servers. BitTorrent’s design makes it extremely efficient in the sharing of large data files among interested peers. 10 . BitTorrent scales well and is a superior method for transferring and disseminating files between interested peers while limiting free riding (peers who download but do not upload) between those same peers. Looking under the hood. BitTorrent is a protocol with some complexity where modeling is useful to gain a better understanding of its performance. Pareto efficiency is the crown jewel of BitTorrent and is the driving force behind the protocol’s popularity and success.

The file being exchanged is the essence of the torrent and a complete copy is referred to as a seed. BitTorrent’s protocol is designed so leeching peers seek each other out for data transfer in a process known as “optimistic unchoking”.torrent file is stored and a complete copy of the file being exchanged. a web server where the . A swarm is coordinated by a tracker server serving the particular 11 .Bit Torrent Protocol Seminar Report 2011 Fig 3. Some argue that the lack of incentive in the protocol is a fundamental design flaw that leads to the punishment of seeds. a . While seeds only upload to leechers. Why seed owners choose to share their files is debatable. some researchers believe the protocol lacks any incentive mechanism for encouraging seeds to remain in torrents. In fact. For a torrent to be alive or active it must have several key components to function. These components include a tracker server.1 : A Typical BitTorrent System The protocol shares data through what are known as torrents. leechers may both download from seeds and upload to other leechers. Peers lacking the file and seeking it from seeds are called leechers. Each of these components is described in the following paragraphs. as the BitTorrent protocol does not reward seed behavior. Together seeds and leechers engaged in file transfer are referred to as a swarm. A seed is a peer in the BitTorrent network willing to share a file with other peers in the network.torrent file.

. The data monitored by the tracker can include peer IP addresses. Usually a tracker coordinates 12 . . another open source tool available under the free software model. In reality the tracker’s role is a bit more complex as many trackers collect data about peers engaged in a swarm. the percentage of the total file downloaded.torrent file from a server. The role of . The first step in the BitTorrent exchange occurs when a peer downloads a . When a . Additionally. length of time connected to the tracker. amount of data uploaded/downloaded for specific peers. Since BitTorrent has no built in search functionality. Torrent files can be created using a program such as MakeTorrent.torrent server.torrent files can be viewed as surrogates for the files being shared. the peer then connects to the tracker server responsible for coordinating activity for that specific torrent.torrent files are usually located via HTTP through search engines or trackers. hashing information about the file and the URL of the tracker coordinating the torrent activity. data transfer rates among peers. The tracker and client communicate by a protocol layered on top of HTTP and the tracker’s key role is to coordinate peers seeking the same file for Cohen envisioned “The tracker’s responsibilities are strictly limited to helping peers find each other”. some of the newer tracker software being released has integrated the functions of the tracker and .Bit Torrent Protocol Seminar Report 2011 torrent and interested peers find the tracker via metadata known as a .torrent files contain key pieces of data to function correctly including file length. and the ratio of sharing among peers. assigned name. These .torrent file. Leechers and seeds are coordinated by the tracker server and the peers periodically update the tracker on their status allowing the tracker to have a global view of the system.torrent files is to provide the metadata that allows the protocol to function.torrent file is opened by the peer’s client software.

torrent files are the metadata information which allow which trackers and peers to coordinate their activities. There is some issue with bandwidth usage to host a tracker. however. This choke policy uses a process known as “optimistic unchoking” to constantly seek other swarm peers who may have more beneficial connections to offer an interested peer. This work defined rational users as those peer nodes manipulating their client software beyond default settings. one server can easily host thousands of . It should be noted that . There has been some research of the tit for tat algorithm by modeling rational users whose behavior is then studied.torrent files without prohibitive server or bandwidth requirements.torrent files serve as mechanisms to assist the BitTorrent protocol. Cohen’s vision of “tit for tat” is the sole incentive measure he saw necessary for the protocol’s success. While trackers and . Regardless. As previously mentioned. rather . the tracker’s bandwidth requirements are much less than hosting the complete files in a traditional client-server model such as one would encounter with an FTP site. the process of actually transferring data is handled by the peers engaged in the swarm.Bit Torrent Protocol Seminar Report 2011 multiple torrents and the most popular trackers are busy coordinating thousands of swarms simultaneously.torrent files are small and require little space to store. The fact that many newer BitTorrent clients allow for custom tweaking of specific upload or download speed indicates that perhaps the original tit for tat coding was too good. especially if the tracker becomes popular and begins to see heavy usage. and thus detrimental to other peer node functions such as normal HTTP traffic. Since . Peers seek tit for tat behavior from others and discourage free riding by a “choke/unchoke” policy. the complete file is actually stored on peer seed nodes and not the tracker server. Some BitTorrent FAQs recommend limiting uploads to 13 .torrent files are not the actual file being shared.

The rarest first algorithm has as its goal the uniform distribution of data across peers.  Torrent : this refers to the small metadata file you receive from the web server (the one that ends in . A rarest first policy requires a seed to upload new file chunks (those not yet uploaded to a swarm) to the newest peer connecting to a torrent.) Metadata here means that the file contains information about the data you want to download. The rarest first algorithm is an interesting aspect of BitTorrent that when combined with optimistic unchoking may explain why the protocol has achieved such success. This policy encourages distribution of the file further across peer nodes.Bit Torrent Protocol Seminar Report 2011 approximately 80% of known capacity and personal tests indicate this strategy does benefit download speeds. also known as the “endgame mode”.torrent. not the data itself.. The final important aspect of the BitTorrent protocol’s architecture is its use of a “rarest piece first” algorithm when a peer begins a file download. Terminology These are the common terms that one would come across while making a typical BitTorrent file transfer. 4. 14 .

The clients are in constant touch with this server to know about the peers in the swarm.  Distributed copies : Sometimes the peers in a swarm will collectively have a complete file. since no one in the swarm has the missing pieces. When this happens.  Reseed : When there are zero seeds for a given torrent. Such copies are called distributed copies. But the main difference between the two is that a leech will not upload once the file is downloaded. 15 . he can continue to upload the file which is called as seeding.  Seed : A computer that has a complete copy of a certain torrent.Bit Torrent Protocol Seminar Report 2011  Peer : A peer is another computer on the internet that you connect to and transfer data. then eventually all the peers will get stuck with an incomplete file. A ratio of 1 means that one has uploaded the same amount of a file that has been downloaded. Once a client downloads a file completely.  Leeches : They are similar to peers in that they won’t have the complete file.  Tracker : A server on the Internet that acts to coordinate the action of BitTorrent clients. Generally a peer does not have the complete file. This is called reseeding. This is a good practice in the BitTorrent world since it allows other users to have the file easily.  Swarm : The group of machines that are collectively connected for a particular file.  Share ratio : This is ratio of amount of a file downloaded to that of uploaded. a seed must connect to the swarm so that those missing pieces can be transferred.

 Tracker .  Snubbed : If the client has not received anything after a certain period.  Peers . This is called optimistic unchoking.a file which contains all details necessary for the protocol to operate. 16 .  Interested : This is the state of a downloader which suggests that the other end has some pieces that the downloader wants.  Optimistic unchoking : Periodically.A server which helps manage the BitTorrent protocol. the connection is said to be choked. 5.Bit Torrent Protocol Seminar Report 2011  Choked : It is a state of an uploader where he does not want to send anything on his link.Users exchanging data via the BitTorrent protocol. In such cases. the client shakes up the list of uploaders and tries sending on different connections that were previously choked. in that the peer on the other end has chosen not to send in a while. Architecture of BitTorrent The BitTorrent protocol can be split into the following five main components:  Metainfo File . it marks a connection as snubbed. Then the downloader is said to be interested in the other end. and choking the connections it was just using.

and data can become scrambled. This protocol is preferable over other protocols such as UDP (User Datagram Protocol) because TCP guarantees reliable and in-order delivery of data from sender to receiver. UDP cannot give such guarantees. and also communicate with a central tracker. Peers communicate with the tracker via the plain text via HTTP (Hypertext Transfer Protocol) The following diagram illustrates how peers interact with each other. The tracker allows peers to query which peers have what data. Peers use TCP (Transport Control Protocol) to communicate and send data.  Client .1 Metainfo File 17 .The files being transferred across the protocol.The program which sits on a peers computer and implements the protocol. Fig 5. and allows them to begin communication.Bit Torrent Protocol Seminar Report 2011  Data . or lost all together.1 : Architecture of a BitTorrent System 5.

Every metainfo file must contain the following information.var. 'pieces': '\xcb\xfaz\r\x9b\xe1\x9a\xe1\x83\x91~\xed@\. A tracker is a server which 'manages' a torrent.'.. 'length': 38190848L. announce: The announce URL of the tracker as a string • The following are optional keys which can also be used: • • announce-list: Used to list backup trackers creation date: The creation time of the torrent by way of UNIX time stamp (integer seconds since 1-Jan-1970 00:00:00 UTC) comment: Any comments by the author created by: Name and Version of programme used to create the metainfo file • • These keys are structured in the metainfo file as follows: {'info': {'piece length': 131072. and implements the bittorrent protocol. 'name': 'Cory_Doctorow_Microsoft_Research_DRM_talk. they must create a metainfo file. Hashes for every data piece. (or 'keys'): • info: A dictionary which describes the file(s) of the torrent. The file is given a '. Either for the single file.Bit Torrent Protocol Seminar Report 2011 When someone wants to publish data using the BitTorrent protocol. such as the data to be included. in SHA 1 format are stored here. and the data is extracted from the file by a BitTorrent client. 'creation date': 1089749086L } 18 ... and contains all the information about a torrent. and is discussed in the next section. or the directory structure for more files. and IP address of the tracker to connect to. This is a program which runs on the user computer.cc:6969/announce'..torrent' extension. } 'announce': 'http://tracker. This file is specific to the data they are publishing.mp3'.

1. lists and dictionaries. Delimiters are not used for byte strings.2 Metainfo File Distribution : Because all information which is needed for the torrent is included in a single file. and as the file 19 . integers. this file can easily be distributed via other protocols. Ending delimiters are always 'e'. Bencoding uses the beginning delimiters 'i' / 'l' / 'd' for integers. lists and dictionaries respectively. but prefixing the number with a zero is not permitted.1 Bencoding : Bencoding is used by bittorrent to send loosely structured data between the BitTorrent client and a tracker. However '0' is allowed. Examples of bencoding: 4:spam // represents the string "spam" i3e // represents the integer "3" l4:spam4:eggse // represents the list of two strings: ["spam".Bit Torrent Protocol Seminar Report 2011 Instead of transmitting the keys in plan text format. Bencoding supports byte strings. Encoding is done using bittorrent specific method known as 'bencoding'. the keys contained in the metainfo file are encoded before they are sent. Bencoding Structure: • • • • Byte Strings : <string length in base ten ASCII> : <string data> Integers: i<base ten ASCII>e Lists: l<bencoded values>e Dictionaries: d<bencoded string><bencoded element>e Minus integers are allowed."eggs"] d4:spaml1:a1:bee // represents the dictionary {"spam" => ["a" . 5.1. "b"] } 5.

2 : Tracker 20 . the number of peers can increase very quickly. and have the required piece.e. 5. It stored statistics about the torrent. Fig 5. but its main role is allow peers to 'find each other' and start communication. That way. to find peers with the data they require. Whenever a peer contacts the tracker. A seed will upload the file. Peers know nothing of each other until a response is received from the tracker. and then others can download a copy of the file over the HTTP protocol and participate in the torrent. it can provide a random list of peers who are participating in the torrent. The most popular method of distribution is using a public indexing site which hosts the metainfo files. it reports which pieces of a file they have. when another peer queries the tracker.2 Tracker A tracker is used to manage users participating in a torrent (know as peers).Bit Torrent Protocol Seminar Report 2011 is replicated. i.

compact: Indicates that the client accepts compacted responses. peer_id: 20-byte string used as a unique ID for the client. BitTorrent clients communicate with the tracker using HTTP GET requests. ip: (optional) The IP address of the client machine. The parameters accepted by the tracker are: • • • • info_hash: 20-byte SHA1 hash of the info key from the metainfo file. and the last 2 bytes are port. uploaded: The total amount uploaded since the client sent the 'started' event to the tracker in base ten ASCII. Multiple trackers can also be specified. numwant: (optional) The number of peers the client wishes to receive from the tracker. This consists of appending a "?" to the URL. The address of the tracker managing a torrent is specified in the metainfo file. • • • • • • 21 . port: The port number the client is listed on. and separating parameters with a "&". downloaded: The total amount downloaded since the client sent the 'started' event to the tracker in base ten ASCII. must be one of the following: started. event: If specified. stopped. completed. The peer list can then be replaced by a 6 bytes per peer.Bit Torrent Protocol Seminar Report 2011 A tracker is a HTTP/HTTPS service and typically works on port 6969. which is a standard CGI method. a single tracker can manage multiple torrents. in base ten ASCII. in dotted format. The first 4 bytes are the host. which are handled by the BitTorrent client running on the users computer. left: The number of bytes the client till has to download. as backups.

then no other keys are included. • • • • • • • 5. trackerid: (optional) If previous announce contained a tracker id. incomplete: number of non-seeding peers (leechers) peers: A list of dictionaries including: peer id. tracker id: A string that the client should send back with its next announce. To get the scrape. Examples: 22 .1 Scraping Scraping is the process of querying the state of a given torrent (or all torrents) that the tracker is managing. warning message: Similar to failure message. but response still gets processed. • The tracker then responds with a "text/plain" document with the following keys: • failure message: If present. The result is known as a "scrape page". interval: The number of seconds a client should wait between sending regular requests to the tracker. min interval: Minimum announce interval.Bit Torrent Protocol Seminar Report 2011 • key: (optional) Allows a client to identify itself if their IP address changes. you must start with the announce URL. IP and ports of all the peers. The value is a human readable error message as to why the request failed.2. find the last '/' and if the text immediately following the '/' is 'announce'. then this can be substituted for 'scrape' to find the scrape page. it should be set here. complete: Number of peers with the complete file.

Each key is made up of a 20-byte binary hash value. BitTorrent uses TCP (Transmission Control Protocol) ports 6881-6889 to send messages and data between peers.php    http://example.com/annnounce http://example. depending on the status of the peer. and have the partial file. incomplete: the number of active downloaders (lechers) name: (optional) the torrent name • • • • 5. The value of that key is then a nested dictionary with the following keys: complete: number of peers with the entire file (seeds) downloaded: total number of times the entire file has been downloaded. but are not guaranteed to be sent.com/scrape http://example. and unlike other protocols. does not use UDP (User Datagram Protocol) 23 . or the complete file (known as a seed).com/a/scrape http://example.Bit Torrent Protocol Seminar Report 2011 Announce URL Scrape URL http://example.3 Peers Peers are other users participating in a torrent.com/announce. Pieces are requested from peers.php The tracker then responds with a "text/plain" document with the following bencoded keys: • files: A dictionary containing one key pair for each torrent.com/scrape.com/a/annnounce http://example.

e. Once this happens. This means that the most common pieces are left until later. i.3. As more peers connect.3 Rarest First When a peer selects which piece to download next. Eventually the original seed will disappear from a torrent. a piece is selected at random to get the download started. 5. At the beginning of a torrent. There are three stages of piece selection. Which piece is requested depends upon the BitTorrent client. the rarest piece will be chosen from the current swarm. or most commonly because of bandwidth issues. then no one will reach completion. there will be only one seed with the complete file. unless a seed re-connects. Losing a seed runs the risk of pieces being lost if no current downloaders have them.3. There would be a possible bottle neck if multiple downloaders were trying to access the same piece. and focus goes to replication of rarer pieces.3. Random pieces are then chosen until the first piece is completed and checked. as peers begin to download from one another. 24 . as the peer has nothing to upload.1 Piece Selection Peers continuously queue up the pieces for download which they require. 5. If the original seed goes before at least one other peer has the complete file.2 Random First Piece When downloading first begins. the piece held by the lowest number of peers. Therefore the tracker is constantly replying to the peer with a list of peers who have the requested pieces. rarest first will the some load off of the tracker. Rarest first works to prevent the loss of pieces by replicating the pieces most at risk as quickly as possible. rarest first avoids this because different peers have different pieces. This could be because of cost reasons. the 'rarest first' strategy begins. which change depending on which stage of completion a peer is at.Bit Torrent Protocol Seminar Report 2011 5.

the peer is said to be choked.3. completion may be delayed. Usually the default for max_uploads is 4. Peers can block others from downloading data if necessary. and waiting for a piece from a peer with slow transfer rates.4 Endgame Mode When a download nears completion. the remaining sub-pieces are request from all peers in the current swarm.Bit Torrent Protocol Seminar Report 2011 5.5 Peer Distribution The role of the tracker ends once peers have 'found each other'. peers that posses the required pieces.3. From then on. This can be done for different reasons. To maintain the integrity of the data which has been downloaded. Peers will continue to download data from all available peers that they can. and the tracker is not involved. it can opt to refuse to transmit that piece. i. 25 . This is known as choking. but the most common is that by default.e. If this happens. The set of peers a BitTorrent client is in communication with is known as a swarm. a peer does not report that they have a piece until they have performed a hash check with the one contained in the metainfo file. To prevent this.3. 5. 5. communication is done directly between peers.6 Choking When a peer receives a request for a piece from another peer. a client will only maintain a default number of simultaneous uploads (max_uploads) All further requests to the client will be marked as choked.

where the downloader responds in one period with the same action the uploader used in the last period. To ensure fairness between peers. 5. each peer has a reserved 'optimistic unchoke' which is left unchoked regardless of the current transfer rate. and therefore messages can be exchanged in both 26 .Bit Torrent Protocol Seminar Report 2011 Fig 5. 5. Connections are symmetrical.8 Communication Between Peers Peers which are exchanging data are in constant communication.3.7 Optimistic Unchoking To ensure that connections with the best data transfer rates are not favoured. The peer which is assigned to this is rotated every 30 seconds.3 : Choking by a peer The peer will then remain choked until an unchoke message is sent. and the seed requires no pieces. Another example of when a peer is choked would be when downloading from a seed. This is enough time for the upload / download rates to reach maximum capacity.3. The peers then cooperate using the tit for tat strategy. This is know as optimistic unchoking. there is a system in place which rotates which peers are downloading.

10 Message Stream This constant stream of messages allows all peers in the swarm to send data. These messages are made up of a handshake. followed by a neverending stream of length-prefixed messages. Prefi x Message Structure Additional Information 0 choke <len=0001><id=0> Fixed length. 2. the connection is closed.Bit Torrent Protocol Seminar Report 2011 directions. If this does not match between peers the connection is closed.9 Handshaking Handshaking is performed as follows: 1. If the peer id does not match the one expected. 27 . 5. A 20 byte peer id is sent which is then used in tracker requests and included in peer requests. 5. no payload.3. The handshake starts with character 19 (base 10) followed by the string 'BitTorrent Protocol'. 3. A 20 byte SHA1 hash of the bencoded info value from the metainfo is then sent. This enables a peer to block another peers request for data.3. and control interactions with other peers.

Fixed length. and if they are still interested in the data. The peer does not have any data required. no payload. A user is interested if a peer has the data they require.Bit Torrent Protocol Seminar Report 2011 1 unchoke <len=0001><id=1> Fixed length. 2 interested <len=0001><id=2> 3 not interested <len=0001><id=3> 4 have <len=0005><id=4><piece index> 28 . Unblock peer. Details the pieces that peer currently has. Fixed length. no payload. upload will begin. no payload. Fixed length. Payload is the zerobased index of the piece.

The payload contains integer values specifying the index. used to request a block of pieces. Variable length. X is the length of bitfield. Sent together with request 6 request <len=0013><id=6><index><begin><length> 7 piece <len=0009+X><id=7><index><begin><block> 29 . Optional. and only sent if client has pieces. begin location and length. Payload represents pieces that have been successfully downloaded.Bit Torrent Protocol Seminar Report 2011 5 bitfield <len=0001+X><id=5><bitfield> Sent immediately after handshaking. Fixed length.

payload is the same as ‘request’. by default. used to cancel block requests. Fixed length. Fixed length. Typically used during ‘end game’ mode.4 Data 30 . begin location and length. then data will be transferred. 8 cancel <len=13><id=8><index><begin><length> A peer will be 'interested' in data if there is a peer which has the required pieces. X is the length of the block. 5. and not interested. The payload contains integer values specifying the index.Bit Torrent Protocol Seminar Report 2011 messages. After handshaking. connections start out as choked. If the peer which has this data is not choked.

contained within any number of directories.1 Piece Size Data is split into smaller pieces which sent between peers using the bittorrent protocol. which enables the tracker to keep tabs on who has which pieces of data.4Mb file could be split into the following pieces. These hashes are stored as part of the 'metinfo file' which is discussed in the next section. The piece size a torrent is allocated depends on the amount of data.75kb. each piece can then be assigned a hash code. The most common piece sizes are 256kb. Therefore. whereas if the piece sizes are too small. and a final piece of 120kb. of multiple files of any type. from kilobytes to hundreds of gigabytes. Piece sizes which are too large will cause inefficiency when downloading (larger risk of data corruption in larger pieces due to fewer integrity checks). For example. Pieces may overlap file boundaries. The number of pieces is therefore: total length / piece size. 31 . The size of the pieces remains constant throughout all files in the torrent except for the final piece which is irregular. 5. as a rule of thumb. File sizes can vary hugely.4. This also breaks the file into verifiable pieces. and can be used to transfer a single file.Bit Torrent Protocol Seminar Report 2011 BitTorrent is very versatile. 512kb and 1mb. more hash checks will need to be run. more hash codes need to be stored in the metainfo file. pieces should be selected so that the metainfo file is no larger than 50 . a 1. The main reason for this is to limit the amount of hosting storage and bandwidth needed by indexing servers. which can be checked by the downloader for data integrity. These pieces are of a fixed size. This shows 5 * 256kb pieces. As the number of pieces increase.

and work upwards until it finds one it can use. Clients come in many flavours. some advanced features are metainfo file wizards and inbuilt trackers. A client can handle multiple torrents running concurrently. For example.Bit Torrent Protocol Seminar Report 2011 Fig 5. the necessary data is extracted. Once the file is read. A metainfo file must be opened by the client to start partaking in a torrent.6 Sub Protocols : 32 . These additional features means different clients behave very differently. opening sockets etc. This means the client will only use one port. depending on the number of processes it is running. and a socket must be opened to contact the tracker. To find an available port. and may use multiple ports. BitTorrent clients use TCP ports 6881-6999. there is no incompatibility issues.4 : Pieces of a file 5. a peer may experience better performance from peers running the same client. As all applications implement the same protocol. 5. and handles interactions with the tracker and peers. the client will start at the lowest port. It runs together with the operating system on a users machine. and opening another BitTorrent client will use another port. customisable ones. The client is sits on the operating system and is responsible for controlling the reading / writing of files. however because of various tweaks and improvements between clients.5 BitTorrent Clients A BitTorrent client is an executable program which implements the BitTorrent protocol. and can range from basic applications with few features to very advanced.

.e. or when an event occurs. containing a bencoded dictionary. This dictionary has all the information required for the client. the full URL would be of this form: “http://some. This means that the machine running the tracker runs a HTPP or HTTPS server. This is done in the standard way.url.url.com/announce?var1=value1&var2=value2&var3=value3”. 5.6. 3. either on regular intervals. The tracker responds with a “text/plain” document. The client then sends re-requests. and has the behaviour described below: 1. 2. if the base URL is “http://some. The client sends a GET request to the tracker URL.1 THP: Tracker HTTP Protocol The tracker protocol is implemented on top of HTTP/HTTPS. with certain CGI variables and values added to the URL. The CGI variables and values added to the base URL by the client sending a GET request are:  info_hash: The 20 byte SHA1 hash calculated from whatever value the info key maps 33 .com/announce”. and one which describes all client-to-client interactions.Bit Torrent Protocol Seminar Report 2011 BitTorrent can be described in terms of two sub-protocols: one which describes interactions between the tracker and all clients. and the tracker responds. i.

but generally bytes are used  left: How much the user has left for the download to be complete.  uploaded: The amount of data uploaded so far by the client. If the download is complete 34 . the tracker then might publish an unroutable IP address to the client. In both cases. This is usually in the range 68816889. but this field is useful if the client and tracker are on the same machine.  ip: This is an optional variable. This can usually be extracted from the TCP connection. giving the IP address of the client. in bytes.Bit Torrent Protocol Seminar Report 2011 to in the metainfo file. There is no official definition on the unit.  peer_id: A 20 character long id of the downloading client. corresponding to one of four possibilities: • • • started: Sent when the client starts the download stopped: Sent when the client stops downloading completed: Sent when the download is complete. but some client applications have adapted some semiformal standards on how to generate this id.  port: The port number that the client is listening on. random generated at start of every download. or behind the same NAT gateway. There is no formal definition on how to generate this id.  event: An optional variable.

The value mapped to this key is a human readable string with the reason to why the connection failed.  interval: The number of seconds that the client should wait between regular rerequests. 35 .Bit Torrent Protocol Seminar Report 2011 at start up. but are implemented by some tracker servers:  numwant: The number of peers the client wants in the response. key is used if the peer changes IP number to prove it’s identity to the tracker. and is thus useless as authorization. • empty: Has the same effect as if the event key is nonexistent. the response is a “text/plain” response with a bencoded dictionary. In either case. There are some optional variables that can be sent along with the GET request that are not specified in the official description of the protocol. peer_id is public. this message should not be sent. As mentioned earlier.  trackerid: If a tracker previously gave its trackerid.  key: An identification key that is not published to other peers. no other keys are included. this should be given here. This dictionary contains the following keys:  failure reason: If this key is present. the message in question is one of the messages sent with regular intervals.

5.  incomplete: The number of peers that not have the complete file yet. but here as well there are optional extensions:  min interval: If present. where each dictionary has the keys: • peer_id: The id of the peer in question.2 PWP: Peer Wire Protocol The peer wire (peer to peer) protocol runs over TCP.  tracker id: A string identificating the tracker. The tracker obtained this by the peer_id variable in the GET request sent to the tracker. Message passing is symmetric. it sets up the TCP connection and sends a 36 . • ip: The address of the peer. When a client wants to initiate a connection.6.  warning message: Has the same information as failure reason. messages are the same sent in both directions.  complete: This is the number of peers that have the complete file available for upload. that each represent a peer. the client must do rereqests more often than this. but the other keys in the dictionary are present. A client should resend it in the trackerid variable to the tracker.Bit Torrent Protocol Seminar Report 2011  peers: Maps to a list of dictionaries. i. either the IP address or the DNS domain name.e. • These are the keys required by the official protocol specification. port: The port number that the peer listens on.

All integers are encoded as four byte big-endian.  Eight reserved bytes for further extension of the protocol. and continues indefinitely. containing the decimal value 19. and the info_hash doesn’t match any torrent it is serving. If the initiator of the connection receives a handshake where the peer id doesn’t match with the id received from the tracker. the receiving side sends a handshake message back.Bit Torrent Protocol Seminar Report 2011 handshake message to the other peer. the connection should be dropped. All bytes are zero in current implementations. Handshake message The handshake message consists of five parts:  A single byte. except the first length prefix in the handshake. it should break the connection. which describes the protocol.  A 20 byte SHA1 hash of the value mapping to the info key in the torrent file. If the initiator accepts this handshake. If a peer is the first recipient to a handshake. Each peer needs to keep the state of each connection. message passing can initiate.  The 20 byte character string representing the peer id. A peer can be either interested or not in another peer. This is the same hash sent to the tracker in the info_hash variable. If the message is acceptable. Newer protocols should follow this convention to facilitate easy identification of protocols. This is the same value sent to the tracker. The state consists of two values. 37 .  A character string “BitTorrent protocol”. interested and choking. This is the length of the character string following this byte.

and report changes to the other peer.Bit Torrent Protocol Seminar Report 2011 and either choke or not choke the other peer. byte 1 to piece 8-15 etc. 1. Clients should keep the am_interested value updated continuously. and interested means that the peer is interested in downloading pieces of the file from the other peer. • • • • am_interested am_choking peer_interested peer_choking All connections start out as not interested and choking for both peers. The messages sent after the handshaking are structured as: [message length as an integer] [single byte describing message type] [payload] Keep alive messages are sent with regular intervals. 3 are choke. and consists of a bitmap. This means that each peer needs four Boolean values for each connection to keep track of the state. Type 0. Type 4 is a have. Type 5 is bitfield. and they are simply a message with length 0. It contains a bitfield representation of which pieces the peer has. This message has length = 5. These messages simply describe changes in state. 38 . giving the integer index of which piece of the file the peer has successfully downloaded and verified. All of them have length 1 and no payload. Peers that have no pieces can neglect to send this message. Choking means that no requests will be answered. unchoke. and no type or payload. 2. The payload is of variable length. A bit set to 1 represents having the piece. where byte 0 corresponds to piece 0-7. and a payload that is a single integer. interested and not interested respectively. This message is only sent directly after handshake.

But it has been exposed to various attacks in the recent past due to the vulnerabilities that are being exploited by the hacker community. This false chunk will fail its hash and will be discarded.1. Type 7 is a block.Bit Torrent Protocol Seminar Report 2011 Type 6 is a request. 3. and length gives the number of bytes the client wants to download. 6. Peers should continuously update their interested status to neighbours. Here are some of the attacks that are commonly seen.1 Pollution attack 1. begin gives the byte offset within the piece.1 Attacks on BitTorrent As we have seen so far. begin and length. The peers receive the peer list from the tracker. piece index. The payload contains piece index. so that clients know which peers will begin downloading when unchoked. This message follows a request. Vulnerabilities of BitTorrent 6. BitTorrent is one of most favoured file transfer protocol in today’s world. 2. This message has the same payload as request messages. Length is usually a power of two. The payload consists of three integers. 6. The attacker sends back a false chunk. 4. and it is used to cancel requests made. length and the data itself that was requested. One peer contacts the attacker for a chunk of the file. 39 . The piece index decides within which piece the client wants to download. Type 8 is cancel.

As the tracker receives requests for a list of participating peers from other clients it sends the victims IP and port number. Attacker requests all chunks from swarm and wastes their upload bandwidth.3 Bandwidth Shaping 40 . 1. Also attacks of this stature are possible because of the modifications that can be done to the client software.1.1. Pollution attacks have become increasingly popular and have been used by anti-piracy groups. The peers then attempt to connect to the victim to try and download a chunk of the file. 6. This means there is no way to trace the culprit in these kind of attacks. 3.Bit Torrent Protocol Seminar Report 2011 5. In 2005 HBO used pollution attacks to prevent people from downloading their show Rome. This attack is possible because of the fact that BitTorrent Tracker has no mechanism for validating peers. The attacker downloads a large number of torrent files from a web server. 6.2 DDOS attack DDOS stands for Distributed denial of service. 4. 2. The attacker parses the torrent files with a modified BitTorrent client and spoofs his IP address and port number with the victims as he announces he is joining the swarm.

which download the list of blacklisted IPs from internet.1 Pollution attack The peers which perform such attacks are identified by tracing their IPs. or set a high retry interval for that specific tracker. To avoid such exploding traffic on their servers many ISPs have started to avoid the traffic caused by BitTorrent. The peer should then exclude hat address from its tracker list. ISPs make use of filters to find out such packets and block them from passing their servers.2 Solutions Many of the attacks that BitTorrent suffers have been dealt with and some measures have been taken to avoid such attacks. 6.2 DDOS attack The main solution to this kind of attack is to have clients parse the response from the tracker. Then. This can be done by sniffing the packets that pass through and detecting whether they oblige BitTorrent protocol.2. 6. These blacklisted IPs are blocked by denying them connections with other peers. This has resulted in many file transfer breakdowns across the world.Bit Torrent Protocol Seminar Report 2011 Many ISPs don’t encourage the use of BitTorrent from their users. In the case where a host (tracker) does not respond to a peer’s request with a valid BitTorrent protocol message it should be inferred that this host is not running BitTorrent. such IPs are blacklisted to avoid further communication with them. Here are a few solutions to the attacks that were discussed above. This is because BitTorrent is usually used to transfer large sized files due to which the traffic over the ISPs increase to a large extent. 41 . This is done by using software like Peer Guardian or moBlock. 6.2.

BitTorrent can serve as an effective media streaming tool as well. or even remove the on-responding trackers from the tracker list in the torrent. the study of applicability of BitTorrent to real-time media streaming applications. 7. This means that the filters are fooled by the encrypted packets and thus packets can sneak through such filters.3 Bandwidth Shaping There are broadly two approaches followed to counter this type of attacks. shows that with minor modifications. Tunnels are dedicated paths where the filters are avoided by using VPN software which connects to the unfiltered networks.2. 42 . Most followon research used similar distributed and randomized algorithms for peer and piece selection. 6. In addition.Bit Torrent Protocol Seminar Report 2011 Another fix would be for web sites hosting torrents to check and report whether all trackers are active. the filters that sniff packets will not be able to detect such packets belonging to BitTorrent protocol. and strives to derive an optimal schedule that could minimize the total elapsed time. By doing this. The first method is to encrypt the packets sent by the means of BitTorrent protocol. This results in successfully bypassing the filters and thus the packets are guaranteed to be transmitted across networks. but with different emphasis or twists. Another approach is to make use of tunnels. Conclusion BitTorrent pioneered mesh-based file distribution that effectively utilizes all the uplinks of participating nodes. This work takes a different approach to the meshbased file distribution problem by considering it as a scheduling problem. we are able to determine how close BitTorrent is to the theoretical optimum. By comparing the total elapsed time of BitTorrent and CSFD in a wide variety of scenarios. Another measure could be to restrict the size of the tracker list to reduce the effectiveness of such an attack.

com/research/2005_slide06. May 22 2003 http://www.wikipedia. the lifespan of each torrent is still not satisfactory.org/protocol.com 2. Thus. Bram (2003) Incentives Build Robustness in BitTorrent.bittorrent.com 43 . BitTorrent Inc. However.Org (2006) http://www.pdf 4. BitTorrent FAQ: http://btfaq.bitconjurer.org/wiki/BitTorrent_(protocol) 6. Information on BitTorrent Protocol en.org/BitTorrent/bittorrentecon. References 1. BitTorrent bandwidth usage http://www.Bit Torrent Protocol Seminar Report 2011 BitTorrent’s application in this information sharing age is almost priceless. Moreover.htm 3. Cachelogic. which means that the length of file distribution can only survive for a limited period of time. BitTorrent.it is still not perfected as it is still prone to malicious attacks and acts of misuse.cachelogic.bittorrent. 8.php 5. (2006) http://www. further analysis and a more thorough study in the protocol will enable one to discover more ways to improve it. Cohen.

dessent. BitTorrent Specifications http://wiki. Other Information http://www.org/BitTorrentSpecification 8.Bit Torrent Protocol Seminar Report 2011 7.theory.net/btfaq/#compare 44 .

Sign up to vote on this title
UsefulNot useful