You are on page 1of 12

Design of a BitTorrent Tracker

And Swarm Unification


C˘alin-Andrei Burloiu, R˘azvan Deaconescu, Nicolae T, ˘apus,
Automatic Control and Computers Faculty
Politehnica University of Bucharest
Emails: calin.burloiu@cti.pub.ro,frazvan.deaconescu,nicolae.tapusg@cs.pub.ro

number of users the more is


the demand and more easily a
file can be transferred between
them.BitTorrent protocol has
been built on a technology
which makes it possible to
1 Abstract distribute large
BitTorrent is a peer-to-peer file
sharing protocol used to
distribute large amounts of
data. BitTorrent is one of the
most common protocols for
transferring large files. Its
main usage is for the transfer
of large sized files. It makes
transfer of such files easier by
amounts of data without the
implementing a different
need of a high capacity server,
approach. A user can obtain
and expensive bandwidth. This
multiple files simultaneously
is the most striking feature of
without any considerable loss
this file transfer protocol. The
of the transfer rate. It is said to
transferring of files will never
be a lot better than the
depend on a single source
conventional file transfer
which is supposed the original
methods because of a different
copy of the file but instead the
principle that is followed by
load will be distributed across
this protocol. It also evens out
a number of such sources. Here
the way a file is shared by
not just the sources are
allowing a user not just to
responsible for file transfer but
obtain it but also to share
also the clients or users who
itwith others. This is what has
want to obtain the file are
made a big difference between
involved in this process. This
this and the conventional file
makes the load get distributed
transfer methods. It makes a
evenly across the users and
user to share the file he is
thus making the main source
obtaining so that the other
partially free from this process
users who are trying to obtain
which will reduce the network
the same file would find it
traffic imposed on it. Because
easier and also in turn making
of this, BitTorrent has become
these users to involve
one of the most
themselves in the file sharing
process. Thus the larger the
popular file transfer meant that most of the users can
mechanisms in today’s world. simply download the files without
Though the mechanism being needed to upload. So this
itself is not as simple as an again put a lot of network load on
ordinary file transfer protocol, the original sources and on small
it has gained its popularity number of users. This led to
because of the sharing policy inefficient usage of bandwidth of
that it imposes on its users. the remaining users. This was the
This fact is quite obvious, since main intention behind Cohen’s
the recent surveys made by invention, i.e., to make the
various organizations show maximum utilization of all the
that 35% of the overall internet users’ bandwidth who are involved
traffic is because of 1 in the sharing of files. By doing so,
BitTorrent. This shows that the every person who wants to
amount of files that are being download a file had to contribute
transferred and shared by towards the uploading process
users through BitTorrent is also. This new and novel concept of
very huge. Cohen gave birth to a new peer to
peer file sharing protocol called
BitTorrent. Cohen invented this
protocol in April 2001. The first
usable version of BitTorrent
appeared in October 2002, but the
system needed a lot of fine-tuning.
BitTorrent really started to take off
2. History in early 2003 when it was used to
BitTorrent was created by a
distribute a new version of Linux
programmer named Bram Cohen.
and fans of Japanese anime started
After inventing this
relying on it to share cartoons. The
new technology he said, "I decided
most important part of this protocol
I finally wanted to work on a
that matters a lot about this is that
project that people would actually
it makes it possible for people with
use, would actually work and would
limited bandwidth to supply very
actually be fun". Before this was
popular files. This means that if you
invented, there were other
are a small software developer you
techniques for file sharing but they
can put up a package, and if it
were not utilizing the bandwidth
turns out that millions of people
effectively. The bandwidth had
want it, they can get it from each
become a bottleneck in such
other in an automated way. Thus
methods. Even other peer to peer
the bandwidth which used to be a
file sharing systems like Napster
bottleneck in previous systems no
and Kazaa had the capability of
longer poses a problem.
sharing files by making the users
involve in the sharing process, but 3.BitTorrent and Other
they required only a subset of approaches
users to share the files not all. This 3.1 Other P2P Methods
The most common method by shared without having access to a
which files are transferred on the proper server, and because of this
Internet is the clientserver model. there is little accountability for the
A central server sends the entire contents of the files. Hence, these
file to each client that requests it, networks tend to be
this is how both http and ftp work. very popular for illicit files such as
The clients only speak to the music, movies, pirated software,
server, and never to each other. etc. Typically, a
The main advantages of this downloader receives a file from a
method are that it's simple to set single source, however the newest
up, and the files are usually always version of some clients allow
available since the servers tend to downloading a single file from
be dedicated to the task of serving, multiple sources for higher speeds.
and are always on and connected The problem discussed above of
to the Internet. However, this popular downloads is somewhat
model has a significant problem mitigated, because there's a
with files that are large or very greater chance that a popular file
popular, or both. Namely, it takes a will be offered by a number of
great deal of bandwidth and server peers. The breadth of files
resources to distribute such a file, available ends to be fairly good,
since the server must transmit the though download speeds for
entire file to each client. obscure files tend to be low.
Perhaps you may have tried to Another common problem
download a demo of a new game sometimes associated with these
just released, or CD images of a systems is the significant protocol
new Linux distribution, and found 3 overhead for passing search
that all the servers report "too queries amongst the peers, and the
many users," or there is a long number of peers that one can
queue that you have to wait reach is often limited as a result.
through. The concept of mirrors Partially downloaded files are
partially addresses this usually not available to other
shortcoming by distributing the peers, although some newer clients
load across multiple servers. But it may offer this functionality.
requires a lot of coordination and Availability is generally dependent
effort to set up an efficient network on the goodwill of the users, to the
of mirrors, and it's usually only extent that some of these networks
feasible for the busiest of sites. have tried to enforce rules or
Another method of transferring restrictions regarding send/receive
files has become popular recently: ratios.
the peer-to-peer network, systems Use of the Usenet binary
such as Kazaa, eDonkey, Gnutella, newsgroups is yet another method
Direct Connect, etc. In most of of file distribution, one
these networks, ordinary Internet that is substantially different from
users trade files by directly the other methods. Files
connecting one-to-one. The transferred over Usenet are often
advantage here is that files can be subject to miniscule windows of
opportunity. Typical retention time especially compared to the other
of binary news servers are often as methods.
low as 24 hours, and having a 3.2 A Typical HTTP File
posted file available for a week is Transfer
considered a long time. However,
the Usenet model is relatively
efficient, in that the messages are
passed around a large web of peers
from one news server to another,
and finally fanned out to the end
user from there. Often the end user
connects to a server provided by
his or her ISP, resulting in further
bandwidth savings. Usenet is also
one of the more anonymous forms
of file sharing, and it too is often
used for illicit files of almost any
nature. Due to the nature Fig : HTTP/FTP File Transfer
of NNTP, a file's popularity has little
to do with its availability and hence
downloads from Usenet tend to be The most common type of file
quite fast regardless of content. transfer is through a HTTP server.
The downsides of this method In this method, a
include a set of rules and HTTP server listens to the client’s
procedures, and requires a certain requests and serves them. Here
amount of effort and understanding the client can only depend on the
from the user. Patience is often lone server that is providing the
required to get a complete file due file. The overall download scheme
to the nature of splitting big files will be limited to the limitations of
into a huge number of smaller that server. Also this kind of
posts. Finally, access to Usenet transfer of file is subjected to single
often must be purchased due to point of failure, where if the server
the extremely high volume of crashes then the whole download
messages in the binary groups. process will seize. A single server
BitTorrent is closest to Usenet. It is can handle many such clients and
best suited to newer files, of which serve the requested file
a number of people have interest simultaneously to all the clients.
in. Obscure or older files tend to The file being served will be
not be available. Perhaps as th available as one single piece, which
software matures a more suitable means that if the download process
means of keeping torrents seeded stops abruptly in the middle the
will emerge, but currently the client whole file has to be downloaded
is quite resource-intensive, making again. BitTorrent protocol has
it cumbersome to share a number overcome all these shortcomings
of files. BitTorrent also deals well seen in this type and thus it is more
with files that are in high demand, robust due to which it is chosen by
many people over thistraditional client not to depend on a server
method of file transfer. completely and also it reduces
3.3 The BitTorrent overall load on the server. Each
Approach client independently sends a file,
called a torrent, that contains the
location of the tracker along with a
hash of each piece. Clients keep
each other updated on the status
of their download. Clients download
blocks from other (randomly
chosen) clients who claim they
have the corresponding data.
Accordingly, clients also send data
that they have previously
downloaded to other clients. Once
a client receives all the blocks for a
given piece, he can verify the hash
Fig : BitTorrent File Transfer of that piece against the provided
hash in the torrent. Thus once a
client has downloaded and verified
In BitTorrent, the data to be shared all pieces, he can be confident that
is divided into many equal-sized he has the complete data. Both
portions called pieces. Each piece BitTorrent and DAP download files
is further sub-divided into equal- from multiple sources. Also the files
sized sub-pieces called blocks. All are divided into pieces in both
clients interested in sharing this approaches. But BitTorrent has
data are grouped into a swarm, many such features that DAP
each of which is managed by a doesn’t, which has made it the
central entity called the tracker. most popular one. In BitTorrent the
BitTorrent has revolutionized the users participate actively in sharing
way files are shared between files along with servers. This is the
people. It does not require a user uniqueness of this protocol. Also
to download a file completely from this needs an implementation of a
a single server. Instead a file can dedicated server called tracker to
be downloaded from many such handle the peers connected in the
users who are indeed downloading network. The file transfer in DAP
the same file. A user who has the takes place through the traditional
complete file, called the seed will HTTP or FTP protocol which means
initiate the download by that the transfer rate will always be
transferring pieces of file to the limited by the server’s bandwidth.
users. Once a user has some If these servers are flooded with
considerable number of such requests then the breakdown and
pieces of a file then even he can the transaction will terminate. This
start sharing them with other users is not the case in BitTorrent since
who are yet to receive those the whole process is not depending
pieces. This concept enables a on servers alone. The load is
distributed across the network peers simultaneously helping each
between peers and servers. This other by uploading and
makes BitTorrent far better than its downloading has been realized by
competing peers like DAP and the BitTorrent system. The protocol
others. shares data through what are
4. Working of BitTorrent known as torrents. For a torrent to
be alive or active it must have
: several key components to
function. These components
include a tracker server, a .torrent
file, a web server where the
.torrent file is stored and a
complete copy of the file being
exchanged. Each of these
components is described in the
Fig 3.1 : A Typical BitTorrent following paragraphs. The file being
System exchanged is the essence of the
torrent and a complete copy is
referred
As previously explained, to as a seed. A seed is a peer in the
BitTorrent’s design makes it BitTorrent network willing to share
extremely efficient in the a file with other peers in the
sharing of large data files among network. Why seed owners choose
interested peers. Looking under the to share their files is debatable, as
hood, BitTorrent is protocol with the BitTorrent protocol does not
some complexity where modeling reward seed behavior. In fact,
is useful to gain a better some researchers believe the
understanding of its performance. protocol lacks any incentive
BitTorrent scales well and is a mechanism for encouraging seeds
superior method for transferring to remain in torrents. Some argue
and disseminating files between that the
interested peers while limiting free lack of incentive in the protocol is a
riding (peers who download but do fundamental design flaw that leads
not upload) between those same to the punishment of seeds. Peers
peers. BitTorrent’s is based on a lacking the file and seeking it from
“tit for tat” reciprocity agreement seeds are called leechers. While
between users that ultimately seeds only upload to leechers,
results in pareto efficiency. Pareto leechers may both download from
efficiency is an important economic seeds and upload to other
concept that maximizes resource leechers. BitTorrent’s protocol is
allocation among peers to their designed so leeching peers seek
mutual advantage. Pareto each other out for data transfer in
efficiency is the crown jewel of a process known as “optimistic
BitTorrent and is the driving force unchoking”. Together seeds and
behind the protocol’s popularity leechers engaged in fil transfer are
and success. Cohen’s vision of referred to as a swarm. A swarm is
coordinated by a tracker server functions of the tracker and
serving the particular torrent and .torrent server. Leechers and seeds
interested peers find the tracker are coordinated by the tracker
via metadata known as a .torrent server and the peers periodically
file. Since BitTorrent has no built in update the tracker on their status
search functionality, .torrent files allowing the tracker to have a
are usually located via HTTP global view of the system. The data
through search engines or trackers. monitored by the tracker can
The first step in the BitTorrent include peer IP addresses, amount
exchange occurs when a peer of data uploaded/downloaded for
downloads a .torrent file from a specific peers, data transfer rates
server. The role of .torrent files is among peers, the percentage of
to provide the metadata that the total file downloaded, length of
allows the protocol to function; time connected to the tracker, and
.torrent files can be viewed as the ratio of sharing among peers.
surrogates for the files being Usually a tracker coordinates
shared. These .torrent files contain multiple torrents and the most
key pieces of data to function popular trackers are busy
correctly including file length, coordinating thousands of swarms
assigned name, hashing simultaneously. It should be noted
information about the file and the that .torrent files are not the actual
URL of the tracker coordinating the file being shared; rather .torrent
torrent activity. Torrent files can be files are the metadata information
created using a program such as which allow which trackers and
MakeTorrent, another open source peers to coordinate their
tool available under the free activities. As previously mentioned,
software model. the complete file is actually stored
When a .torrent file is opened by on peer seed nodes and not the
the peer’s client software, the peer tracker server. Since .torrent files
then connects to are small and require little space to
the tracker server responsible for store, one server can easily host
coordinating activity for that thousands of .torrent files without
specific torrent. The tracker and prohibitive server or bandwidth
client communicate by a protocol requirements. There is some issue
layered on top of HTTP and the with bandwidth usage to host a
tracker’s key role is to coordinate tracker, however,
peers seeking the same file for especially if the tracker becomes
Cohen envisioned “The tracker’s popular and begins to see heavy
responsibilities are strictly limited usage. Regardless, the tracker’s
to helping peers find each other”. bandwidth requirements are much
In reality the tracker’s role is a bit less than hosting the complete files
more complex as many trackers in a traditional client-server model
collect data about peers engaged such as one would encounter with
in a swarm. Additionally, some of an FTP site.
the newer tracker software being While trackers and .torrent files
released has integrated the serve as mechanisms to assist the
BitTorrent protocol, the process of mode”. A rarest first policy requires
actually transferring data is a seed to upload new file chunks
handled by the peers engaged in (those not yet uploaded to a
the swarm. Cohen’s vision of “tit swarm) to the newest peer
for tat” is the sole incentive connecting to a torrent. This policy
measure he saw necessary for the encourages distribution of the file
protocol’s success. Peers seek tit further across peer nodes.. The
for tat behavior from others and rarest first algorithm is an
discourage free riding by a interesting aspect of BitTorrent
“choke/unchoke” policy. This choke that when combined with optimistic
policy uses a process known as unchoking may explain why the
“optimistic unchoking” to protocol has
constantly seek other swarm peers achieved such success.
who may have more beneficial
connections to offer an interested
peer. There has been some 5. Terminology
research of the tit for tat algorithm These are the common terms that
by modeling rational users one would come across while
whose behavior is then studied. making a typical BitTorrent file
This work defined rational users as transfer.
those peer nodes  Torrent : this refers to the small
manipulating their client software metadata file you receive from the
beyond default settings. The fact web server
that many newer (the one that ends in .torrent.)
BitTorrent clients allow for custom Metadata here means that the file
tweaking of specific upload or contains
download speed indicates that information about the data you
perhaps the original tit for tat want to download, not the data
coding was too good, and thus itself.
detrimental to other peer node  Peer : A peer is another
functions such as normal HTTP computer on the internet that you
traffic. Some BitTorrent FAQs connect to and
recommend limiting uploads to transfer data. Generally a peer
approximately 80% of known does not have the complete file.
capacity and personal tests  Leeches : They are similar to
indicate this strategy does benefit peers in that they won’t have the
download speeds. complete file.
The final important aspect of the But the main difference between
BitTorrent protocol’s architecture is the two is that a leech will not
its use of a “rarest piece first” upload once
algorithm when a peer begins a file the file is downloaded.
download. The rarest first  Seed : A computer that has a
algorithm has as its goal the complete copy of a certain torrent.
uniform distribution of data across Once a client
peers, also known as the
“endgame
downloads a file completely, he can
continue to upload the file which is
called
as seeding. This is a good practice
in the BitTorrent world since it
allows other
users to have the file easily.
 Reseed : When there are zero
seeds for a given torrent, then
eventually all the
peers will get stuck with an
incomplete file, since no one in the
swarm has the
missing pieces. When this happens,
a seed must connect to the swarm
so that
those missing pieces can be Fig 5.1 : Architecture of a
transferred. This is called BitTorrent System
reseeding.
 Swarm : The group of machines
that are collectively connected for The BitTorrent protocol can be split
a particular into the following five main
file. components:
 Tracker : A server on the  Tracker - A server which helps
Internet that acts to coordinate the manage the BitTorrent protocol.
action of  Peers - Users exchanging data
BitTorrent clients. The clients are in via the BitTorrent protocol.
constant touch with this server to  Data - The files being
know transferred across the protocol.
about the peers in the swarm. 13
choking the connections it was just  Client - The program which sits
using. This is called optimistic on a peers computer and
unchoking. implements the protocol. Peers use
TCP (Transport Control Protocol) to
6. Architecture of communicate and send data. This
protocol is preferable over other
BitTorrent protocols such as UDP (User
Datagram Protocol) because TCP
guarantees reliable and in-order
delivery of data from sender to
receiver. UDP cannot give such
guarantees, and data can become
scrambled, or lost all together. The
tracker allows peers to query which
peers have what data, and allows
them to begin communication.
Peers communicate with the single tracker can manage multiple
tracker via the plain text via HTTP torrents. Multiple trackers can also
(Hypertext Transfer Protocol) The be specified, as backups, which are
following diagram illustrates how handled by the BitTorrent client
peers interact with each other, and running on the users computer.
also communicate with a central BitTorrent clients communicate
tracker. with the tracker using HTTP GET
6.1 Tracker requests, which is a standard CGI
method. This consists of appending
a "?" to the URL, and separating
parameters with a "&".
6.2 Peers
Peers are other users participating
in a torrent, and have the partial
file, or the
complete file (known as a seed).
Pieces are requested from peers,
but are not guaranteed to be sent,
depending on the status of the
peer. BitTorrent uses TCP
(Transmission Control Protocol)
ports 6881-6889 to send messages
and data between peers, and
Fig : Tracker unlike other protocols, does not
use UDP (User Datagram Protocol)
A tracker is used to manage users 6.3 Data
participating in a torrent (know as BitTorrent is very versatile, and can
peers). It stored statistics about the be used to transfer a single file, of
torrent, but its main role is allow multiple files of any type,
peers to 'find each other' and start contained within any number of
communication, i.e. to find peers directories. File sizes can vary
with the data they require. Peers hugely, from
know nothing of each other until a kilobytes to hundreds of gigabytes.
response is received from the 6.4 BitTorrent Clients
tracker. Whenever a peer contacts A BitTorrent client is an executable
the tracker, it reports which pieces program which implements the
of a file they have. That way, when BitTorrent protocol. It runs together
another peer queries the tracker, it with the operating system on a
can provide a random list of peers users machine, and handles
who are participating in the torrent, interactions with the tracker and
and have the require piece. peers. The client is sits on the
A tracker is a HTTP/HTTPS service operating system and is
and typically works on port 6969. responsible for controlling the
The address of reading / writing of files, opening
the tracker managing a torrent is sockets etc. A metainfo file must
specified in the metainfo file, a
be opened by the client to start based file distribution problem by
partaking in a torrent. Once the file considering it as a scheduling
is read, the necessary data is problem, and strives to derive an
extracted, and a socket must be optimal schedule that could
opened to contact the tracker. minimize the total elapsed time. By
BitTorrent clients use TCP ports comparing the total elapsed time of
6881-6999. To find an available BitTorrent and CSFD in a wide
port, the client will start at the variety of scenarios, we are able to
lowest port, and work upwards until determine how close BitTorrent is
it finds one it can use. This means to the theoretical optimum. In
the client will only use one port, addition, the study of applicability
and opening another BitTorrent of BitTorrent to real-time media
client will use another port. A client streaming applications, shows that
can handle multiple torrents with minor modifications,
running concurrently. Clients come BitTorrent can serve as an effective
in many flavours, and can range media streaming tool as well.
from basic applications with few BitTorrent’s application in this
features to very advanced, information sharing age is almost
customisable ones. For example, priceless. However, it is still not
some advanced features are perfected as it is still prone to
metainfo file wizards and inbuilt malicious attacks and acts of
trackers. These additional features misuse. Moreover, the lifespan of
means different clients each torrent is still not satisfactory,
behave very differently, and may which means that the length of file
use multiple ports, depending on distribution can only survive for a
the number of processes it is limited period of time. Thus, further
running. As all applications analysis and a more thorough
implement the same protocol, study in the protocol will enable
there is no incompatibility issues, one to discover more ways to
however because of various tweaks improve it.
and improvements between
clients, a peer may experience
better performance from peers
running the same client.
7. Conclusion
BitTorrent pioneered mesh-based
file distribution that effectively
utilizes all the 8. References
[1] BitTorrent Specification.
uplinks of participating nodes. Most http://wiki.theory.org/BitTorrentSpecification.
followon research used similar [2] hrktorrent. http://50hz.ws/hrktorrent/.
distributed and randomized [3] ipoque Internet Studies.
http://www.ipoque.com/resources/internetstudi
algorithms for peer and piece es/
selection, but with different internet-study-2008 2009.
emphasis or twists. This work takes [4] libtorrent (Rasterbar).
http://www.rasterbar.com/products/libtorrent/.
a different approach to the mesh- [5] OpenVZ. http://wiki.openvz.org/.
[6] P2P-Next. http://www.p2p-next.org/.
[7] RFC 2328 - OSPF Version 2.
http://tools.ietf.org/html/rfc2328.
[8] The XBT Tracker frontend.
http://www.visigod.com/xbtrackerfrontend.
[9] XBT Configuration Options .
http://www.visigod.com/xbttracker/
configuration.
[10] XBT Table Documentation.
http://www.visigod.com/xbt-
tracker/tabledocumentation.
[11] XBT Tracker by Olaf van der Spek.
http://xbtt.sourceforge.net/tracker/.
[12] H. Balakrishnan, M. F. Kaashoek, D.
Karger, R. Morris, and I. Stoica.
Looking up data in p2p systems. Commun.
ACM, 46:43–48, February
2003.
[13] A. R. Bharambe, C. Herley, and V. N.
Padmanabhan. Understanding
and Deconstructing BitTorrent Performance.
Technical Report MSR-TR-
2005-03, Microsoft Research, 2005.
[14] R. Deaconescu, G. Milescu, B. Aurelian, R.
Rughinis, , and N. T,
˘apus, . A
Virtualized Infrastructure for Automated
BitTorrent Performance Testing
and Evaluation. International Journal on
Advances in Systems and
Measurements, 2(2&3):236–247, 2009.
[15] T. Locher, P. Moor, S. Schmid, and R.
Wattenhofer. Free Riding in
BitTorrent is Cheap. HotNets, 2006.