You are on page 1of 16

Annals of Business Administrative Science 19 (2020) 277–292

https://doi.org/10.7880/abas.0200908a
Received: September 8, 2020; accepted: October 16, 2020
Published in advance on J-STAGE: December 5, 2020

Deep Web, Dark Web, Dark Net:


A Taxonomy of “Hidden” Internet
Masayuki HATTAa)

Abstract: Recently, online black markets and anonymous virtual


currencies have become the subject of academic research, and we
have gained a certain degree of knowledge about the dark
web. However, as terms such as deep web and dark net, which
have different meanings, have been used in similar contexts,
discussions related to the dark web tend to be
confusing. Therefore, in this paper, we discuss the differences
between these concepts in an easy-to-understand manner,
including their historical circumstances, and explain the technology
known as onion routing used on the dark web.

Keywords: anonymity, deep web, dark web, dark net, privacy

a)
Faculty of Economics and Management, Surugadai University. 698 Azu, Hanno, Saitama,
Japan. hatta.masayuki@surugadai.ac.jp
A version of this paper was presented at the ABAS Conference 2020 Summer (Hatta, 2020b).
© 2020 Masayuki Hatta. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License, which permits unrestricted reuse, distribution, and reproduction in any
medium, provided the original work is properly cited.

277
Hatta

Introduction
In recent years, the term “dark web” has become popular. The dark
web, i.e., a World Wide Web wherein your anonymity is guaranteed
and one that cannot be accessed without using special software, was,
until recently, of interest to only a few curious people. However, in
2011, the world’s largest online black market, Silk Road (Bolton,
2017), was established on the dark web; with the presence of virtual
currencies, which incorporate the anonymity provided on the dark
web (Todorof, 2019), it has become a topic of economic and business
research. Words similar to “dark web” (such as “deep web” and “dark
net”) are used in the same context, but they are completely different
technical concepts; this leads to confusion.

Deep Web
Historically, among the three terms (“dark web,” “deep web,” and
“dark net”), the term “deep web” was the first to emerge. Computer
technician and entrepreneur, Michael K. Bergman, first used it in his
white paper “The deep web: Surfacing hidden value” (Bergman,
2001). Bergman likened web searches to the fishing industry and
stated that legacy search engines were nothing more than fishing
nets being dragged along the surface of the sea, even though there is
a lot of important information deep in the sea, where the nets do not
reach. Therefore, he stated that, moving forward, it was important to
reach deep areas as well. This was the advent of the deep web.
Bergman stated that the deep web was 400–550 times larger than
the normal web, and that the information found in the deep web was
1,000–2,000 times the quality of the normal web. The problem is that
even now this is used in the context of the dark web. What Bergman
(2001) first raised as detailed examples of the “deep” web were the
National Oceanic and Atmospheric Administration (NOAA) and
278
Deep web, dark web, dark net

United States Patent and Trademark Office (USPTO) data, JSTOR


and Elsevier fee-based academic literature search services, and the
eBay and Amazon electronic commerce sites; these are still referred
to as the “deep web” today. In short, Bergman referred to the
following as the deep web:

(a) Special databases that could only be accessed within an


organization
(b) Sites with paywalls wherein content can only be partly seen or
not seen at all without registration
(c) Sites in which content is dynamically generated each time they
are accessed
(d) Pages that cannot be accessed without using that site’s search
system
(e) Electronic email and chat logs

That is to say, it refers to a Web that normal search engines, such


as Google, cannot edit or index.
Incidentally, according to Bergman, in 1994, there were already
people using the “invisible web,” in the sense that it could not be
searched by a search engine. However, Bergman asserted that the
deep web was just deep, and not “invisible,” i.e., it could be searched
with innovations. The start-up that he was managing at that time
was selling this very technology. Furthermore, following this, Google
formed a separate agreement with the owners of databases and
started the Google Books project with university libraries, becoming
involved in “deep” field indexing; thus, in 20 years, the
deep web—in the sense that Bergman used it—is considered to
have shrunk considerably.
In this manner, originally the “deep” in deep web was simply
somewhere that was deep and difficult to web-crawl, and did not
contain nuances of good or evil. Despite this, “deep” is a powerful
word and, as will be described later, this has led the way in
279
Hatta

entrenching the image associated with the dark web as something


thick and murky.

Dark Net
The term “dark net” became popular at virtually the same time as
the term “dark web” did. There is a hypothesis that this has been
used since the 1970s and although even today, in concrete terms, an
IP address that is not allocated to a host computer is referred to as
the dark net, the trigger for it being used as a general term as it is
now was in 2002 (published in 2003), when a paper was written by
four engineers including Peter Biddle (he was working at Microsoft at
that time), who called the dark net as the future of content
distribution (Biddle, England, Peinado, & Willman, 2003).
Sweeping the world at that time was the P2P file sharing service
software Napster service (started in 1999) and Gnutella (released in
2000). Operation of File Rogue started at around the same time in
Japan. There were fears of copyright infringement, and in the paper
written as part of the research on Digital Rights Management (DRM)
and copy protection (Biddle et al., 2003), the term “dark net” was
clearly being used in the negative meaning of illegal activity.
Biddle et al. (2003) broadly defined dark net as “a collection of
networks and technologies used to share digital content” (Biddle et
al., 2003, p. 155). Based on this, it can be summarized as follows.

(1) This started with the manual carrying of physical media such
as CDs, DVDs, and more recently, USB memory—the
so-called “Sneakernet.”
(2) With the spread of the Internet, files such as music files began
to be stored on one server, giving birth to the “central server”
model. However, if the central server were destroyed, that
would be the end.

280
Deep web, dark web, dark net

(3) Files or parts of files were shared on multiple servers using


Napster or Gnutella and by the shared servers (peer)
communicating together—a Peer to Peer (P2P) model
(meaning that if only one point of the network was destroyed,
the network as a whole would survive) appeared.

This P2P model was realized on the existing physical network,


using technology known as an overlay network that utilizes
non-standard applications and protocols.
Additionally, Biddle et al. (2003) noted that as Napster had a
central server for searching, it could be controlled using
that. Moreover, although Gnutella was completely distributed, the
individual peers were not anonymous and you could learn their IP
addresses, so it was possible to track them and hold them legally
responsible. In this way, measures could be taken in regard to the
P2P file sharing at the time, but it was predicted that a new dark net,
where these weaknesses were overcome, would emerge.
Biddle et al. (2003) considered that, even if protected, it could be
widely diffused via the dark net, and that the dark net would
continue to evolve. They reached the conclusion that DRM was
fundamentally meaningless, and that to eradicate pirated versions,
official versions also needed to have a reasonable price and be
convenient for customers, as well as compete on the same
ground. This pronouncement put the jobs of Biddle et al. at risk (Lee,
2017). However, considering that attempts at measures against
piracy through copyright enforcement have continually failed, and
that currently piracy is being put to the sword by the emergence of
superior platforms, such as Netflix and Spotify, such
pronouncements have proven to be correct.

281
Hatta

Yet Another Dark Net: F2F


Possibly due to the fact that dark net is an attractive name, around
the same time as Biddle et al. (2003), the term “dark net” began to be
used as a general term for a slightly different technology. This is
called Friend-to-Friend (F2F), 1 and as this was implemented as
Darknet mode by Freenet, which is one of the main types of dark web
software (to be described later), this also became known as Darknet.
In this sense, Darknet, or F2F, is a type of P2P network, and the
user only directly connects with acquaintances (in many cases, they
have met in real life and built up trust via a non-online route). A
password or digital signature is used for authentication. The basic
concept behind F2F is that a P2P overlay network is constructed over
the existing relationships of trust between users. This is a method in

Figure 1. Topology of Darknet. A participant with


malicious intent (e.g., a red one) cannot easily
understand the entire network.

Source: the author.

1 The term F2Fitself was invented in the year 2000 (Bricklin, 2000).
282
Deep web, dark web, dark net

Figure 2. Topology of Opennet. The outside observer


can understand the entire network thanks to the
existence of a directory server.

Source: the author.

which the network is subdivided and, rather than an unspecified


large number of people, they connect to a much smaller group of, say,
five people as shown as an example in Figure 1, whom they know well
and trust. In this sense, the term Opennet, used in Figure 2, is an
antonym of Darknet.

Overlay network

Here, an overlay network is a general term for a network


constructed “over” a separate network. Typically, it refers to a
computer network constructed over the Internet. The problem in this
case is one of routing. On the Internet, based on TCP/IP, it is possible
to reach other servers by specifying an IP address. However, in the
case of an overlay network, this IP address is not necessarily known
or usable, so technology such as a Distributed Hash Table (DHT) is
utilized to route an existing node using a logical address.
In F2F, each user operates as an overlay node. Contrary to an
Opennet P2P network, with F2F, it is not possible to connect to an

283
Hatta

arbitrary node and exchange information. Instead, each user


manages “acquaintances” that they trust and establishes safe and
authenticated channels only with a fixed number of other nodes.
As pointed out by Biddle et al. (2003), in the Gnutella network, for
example, there was the problem that the attributes of network
participants, such as IP addresses, were known by all network
participants. The participants could be infiltrated by police or
information agencies, and if their attributes are known, there is the
danger of them being tracked and of legal action being taken against
them. Additionally, as connections are concentrated on powerful
nodes with an abundance of network resources, as shown in Figure 3,
when that node becomes an adversary, the overall image of the
adversary’s network can be grasped in a so-called “harvesting”
attack. With F2F, it is possible to create a P2P network that can
withstand harvesting.
Contrary to other dark web implementations such as Tor or I2P
(described later), F2F network users are unable to know who is

Figure 3. Harvesting attack. If there is a powerful


server (possibly run by adversaries), all nodes
would try to connect that server; thus, the entire
network would be revealed.

Source: the author.

284
Deep web, dark web, dark net

participating in the network other than their own “acquaintances,” so


the scale of the network can be expanded without losing the
anonymity of the network as a whole. In other words, dark means
that it is difficult to see and grasp an overall image of the network.
In a “simple” F2F network, there is no path that reaches beyond
“acquaintances,” so the average network distance is infinite. However,
as indirect anonymous communication between users who do not
know or trust each other is supported, even if it is between nodes for
which trust has not been established, as long as there are common
nodes that are acquaintances of both, by going via this node, it is
possible for both to communicate with anonymity (small world
network).
It is interesting how both trends of the dark net Watts and Strogatz
(1998) described herein and social networks were established once
this small world phenomenon became commonplace. It can be said
that the dark net is like a twin sibling to social networks, which are at
the peak of their prosperity.

Dark Web
It is unclear when the dark web first appeared. The term dark web
began to be used around 2009, but merging with the deep web was
already seen at that time (Becket, 2009). To understand the dark web,
which is different from the deep web and dark net (which are
comparatively simple, technologically), an understanding of
computer network basics is required.

Internet basics

On the Internet, “access” is realized by the correspondence of a


high quantity of messages between a client at the user location and a
server in a remote location. For example, when viewing a website
using a web browser, a request message saying “send data on this
285
Hatta

page here” from the computer of the viewer is sent to the web server
on the server side in accordance with the fixed rules (known as
“protocols”). The web server receiving this message then sends the
requested data.
At this time, the message is minutely subdivided into data of a fixed
size, called “packets,” in which data called a “header” (wherein
control information such as the IP address of the sender and
destination is described at the start of each header fragment) is
attached, are exchanged. The side receiving the packet connects and
reconstructs the message and takes action accordingly.
On the Internet, such packets are sent in a packet relay via many
server machines to the destination server. This type of packet flow is
called “traffic.” Looking at the header and deciding where and how to
send the packet is known as “routing,” and the general name given to
devices and software that make these decisions and transmit these
packets is “routers.”

Assumed anonymity versus real anonymity

The above text describes the mechanism by which data on the


Internet (described as the Clearnet, in contrast to the dark net) is
exchanged; however, when the Internet is accessed, a “record” always
remains. For example, if you view a website from a PC, the server
hosting this website will have a record (access log) showing at what
hour and what minute this page was accessed and from where. In
many cases, the only thing recorded is the identifier number, known
as the IP address. IP addresses are allocated to individual
communications devices, and as the IP address assigned may change
with every connection, it may be difficult to identify the location and
person involved using the IP address alone. However, as you will
know the Internet service provider (ISP) used by the device for this
connection, you can then get information on the contracted party
from the ISP. So, you can trace each step back one by one.
286
Deep web, dark web, dark net

The log is often stored on the server side for a fixed period of time
(in many cases, from three months to one year or more). Therefore, if
the investigatory body receives submission of a log from the body
managing the server or ISP, etc., they can start to track down the
sender. Of course, there are issues with freedom of expression and
secrecy of communications, so even investigatory authorities are
unable to acquire sender information in an unlimited way. However,
in cases involving requests for disclosure of sender information based
on the Law on Restrictions on the Liability for Damages of Specified
Telecommunications Service Providers, identification of information
senders by the police or the prosecution after acquiring a warrant
from the courts is an everyday occurrence.

Anonymization by Tor: Onion routing

Therefore, there are systems, such as Tor, that make it difficult for
information senders to be identified. Tor is software designed to
enable Internet users to communicate while maintaining anonymity
and has been named based on the initial letters for “The Onion
Router.” Tor is an open-source software that runs on various
platforms2 and can be freely obtained by anyone (and is often free of
charge). In recent years, the development of Tor has been furthered
on a volunteer basis. Originally, however, this technology was
developed at a US-navy laboratory at the start of the1990s.
Tor adds a tweak to the basic mechanism of data exchange over the
normal Internet. Tor constructs a virtual network over the Internet
and functions as a special router over this network. This is a unique
form of routing and, as the name suggests, it uses a type of
technology known as “onion routing.”
For example, in the same way that if the destination on postal
items is written in a code, mail will not be delivered, the sender IP

2 For the development process of Open source software, refer to Hatta (2018,
2020a), etc.
287
Hatta

address in the packet header is described in unencrypted data (plain


text). Thus, the access log can be captured on the server. Additionally,
as the destination IP address is also described, it is possible to lie in
wait on a server during the course of traffic, i.e., a packet relay, and
edit or eavesdrop on the packet header as it passes, and statistically
analyze the type and frequency of access, exposing the sender’s
identity. For example, if there is somebody in the outback of
Afghanistan who frequently accesses a US army-related site, even if
they do not know the content of the communications, there is a high
possibility that an intelligence agent who has infiltrated the US army
server is accessing the US army site. The general name for this type of
method is traffic analysis.
Onion routing was invented as a means of countering such traffic
analysis.3 If you want to access a particular server anonymously,
you should install Tor on your own computer, change the proxy
settings on your web browser, and set all packets going from your
computer to do so via Tor. If you do so, based on the following steps,
Tor will provide you with anonymity.

Step 1: Choosing relay nodes randomly

Tor, which picks up packets leaving your computer, obtains a list


of IP addresses of servers on which the Tor onion router is set from
directory servers on the Internet (called Tor nodes or relays) and
selects at least three of these nodes at random. If the selected nodes
are set to Tor node A, Tor node B, and Tor node C, Tor routing is then
performed on your own computer.

Your computer → Tor node A → Tor node B → Tor node C →


destination server

3 I2P, known as a non-Tor implementation of the dark web, uses garlic


routing as an improved version of onion routing. Although it uses Freenet
or a different algorithm, it is basically the same as onion routing.
288
Deep web, dark web, dark net

It is determined that the packet is sent using this route. (Tor node
C is the terminal point on the virtual network created by Tor, and as
this is a connecting point that reconnects to the Internet, in
particular, this is referred to as an exit node.)

Step 2: Peeling onions at each nodes

Next, Tor attaches a Tor header to the packets to be sent. At each


node, the header that has the Tor node IP address to pass the packet
to next is attached.

A) Tor node A header: Tor node B IP address


B) Tor node B header: Tor node C IP address
C) Tor node C header: Destination Server IP address

In practice, as shown in Figure 4, the header is wound from the


inside, in reverse order. First, C) in the Tor node C header, the IP
address of the destination server where you want to send the packet
is written, and the whole thing is encrypted with a key which can only

Figure 4. Onion

Source: the author.

289
Hatta

be decoded on Tor node C. Additionally, on top of that B) the Tor node


B header is attached, and the whole thing is encrypted with a key
that can only be decoded on Tor node B. The same process is carried
out for the A) Tor node A header. In other words, the one closest to
the final destination is placed on the inside, and each layer is locked
with keys so that, at each stage, it can be decoded only by specific Tor
nodes. The packet is passed to Tor node A on top of this. After
decoding the packet headers received at each Tor node, the packet is
transferred, as in a packet relay, to the next node written there.
In this way, it is as if you are peeling the skin off an onion one layer
at a time, and each node opens the header for itself one at a time,
decodes it, and passes the packet to the next node. This is the reason
that it is called “onion” routing. The peeled-off skin, i.e., the header
for yourself that you have decoded is discarded by you.
The header for the previous node is discarded by that node. So, for
example, Tor node C knows that a packet has come from Tor node B,
but it does not know that the one before Tor node B was Tor node
A. Additionally, Tor node A knows that it has received a packet from
the departure point node and that it needs to pass it to Tor node B,
but it does not know the next node or that the contents of the packet
are encrypted with the Tor node B key. Therefore, the most important
thing is that, from the destination server perspective, the packet does
not come from the departure point but appears to come from Tor
node C. Therefore, even if an access log is taken for the destination
server, what is recorded in the log is the exit node of the Tor node C IP
address and not the departure point IP address. Additionally, Tor
node C is operated by the Tor node and is simply selected at random,
and there is effectively nothing linking the departure point and Tor
node C.

290
Deep web, dark web, dark net

Table 1. Deep web, dark web, and (two) dark nets

Type The opposite The strength of


anonymity
Deep web WWW None

Dark net The legit contents Medium


distribution
Dark net (F2F) Open net Strong

Dark web Clearnet Strong

Source: the author

Conclusion
In this paper, we have provided a simple explanation of the
deep web, dark web, and (two) dark nets, including their technical
aspects. These concepts, as shown in Table 1, are easy to understand
based on what each one is the opposite of. Discussions of the dark
web, etc., tend to be hampered by the image that the word
presents. When researching this, moving forward, a precise
understanding based on its historical and technical nature is
required.

Acknowledgments
This work was supported by JSPS Grant-in-Aid for Publication of
Scientific Research Results, Grant Number JP16HP2004.

291
Hatta

References
Becket, A. (2009, November 26). The dark side of the internet. The
Guardian. Retrieved from http://www.theguardian.com/technology/
2009/nov/26/dark-side-internet-freenet
Bergman, M. K. (2001). White paper: The deep web: Surfacing hidden
value. Journal of Electronic Publishing, 7(1). doi:
10.3998/3336451.0007.104
Biddle, P., England, P., Peinado, M., & Willman, B. (2003). The darknet
and the future of content protection. In J. Feigenbaum (Ed.), Digital
rights management (pp. 155–176). Berlin, Heidelberg, Germany:
Springer. doi: 10.1007/978-3-540-44993-5_10
Bolton, N. (2017). American kingpin: The epic hunt for the criminal
mastermind behind the Silk Road. New York, NY: Portfolio/Penguin.
Bricklin, D. (2000, August 11). Friend-to-Friend Networks [Web log
message]. Retrieved from http://www.bricklin.com/f2f.htm
Hatta, M. (2018). The role of mailing lists for policy discussions in open
source development. Annals of Business Administrative Science, 17,
31–43. doi: 10.7880/abas.0170904a
Hatta, M. (2020a). The right to repair, the right to tinker, and the right to
innovate. Annals of Business Administrative Science, 19,
143–157. doi:10.7880/abas.0200604a
Hatta, M. (2000b, August). Deep web, dark web, dark net: A taxonomy of
“hidden” Internet. Paper presented at ABAS Conference 2020 Summer,
University of Tokyo, Japan.
Lee, T. B. (2017, November 24). How four Microsoft engineers proved that
the “darknet” would defeat DRM [Web log message]. Retrieved from
https://arstechnica.com/tech-policy/2017/11/how-four-microsoft-e
ngineers-proved-copy-protection-would-fail/
Todorof, M. (2019). FinTech on the dark web: The rise of cryptos. ERA
Forum, 20(1), 1–20. doi: 10.1007/s12027-019-00556-y
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’
networks. Nature, 393(6684), 440–442. doi: 10.1038/30918

292

You might also like