Professional Documents
Culture Documents
53
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
card information, and the like, are rampant. It also provides domains, we performed our analyses using the “Ichidan” [9]
cover for DDoS tools such as Booter and other forms of search engine and the “Fresh Onions” [10] open source Tor.
cyberattack [4]. However, if all of the illegal sites on the Dark Hence, we can assert beforehand that our actions did not violate
Web could be refused beforehand, it would not be a dangerous the ethical guidelines provided above.
place. Like the Surface Web, it is primarily an environment for
Ichidan is a search engine similar to Shodan [11] for the Dark
simple blogs, community sites, forums, and so on.
Web. Shodan is a website that collects and publishes device
information connected to the Internet by conducting port scans
2 ABOUT TOR and banner surveys [12]. Fresh Onions, which is Tor’s hidden
Tor is the onion routing anonymization network devised by service crawler, is an open source project published on GitHub,
employees at the US Naval Research Laboratory (NRL). It is which is a web-based hosting service for version control using
implemented by combining peer-to-peer (P2P) technology, which Git [13]. Git is a version-control system for tracking changes in
is a network in which the client communicates in a one-to-one computer files and coordinating work among multiple files and
relationship, with the SOCKS technology in relaying TCP/IP people.
communications. Tor’s relay nodes are spread all over the world
and the network currently accesses more than 6,000 servers [5].
3 DARK WEB INCIDENTS
Since those relay nodes do not keep log information and node
In this section, we will begin by discussing a coin check incident
connections are changed at fixed times, and since all
relevant to the Dark Web that occurred in Japan. Specifically, the
communications other than the exit data are encrypted, the
coin check site was accessed illegally around the end of January
communication source becomes undetermined and the level of
2018, and a NEM cryptocurrency (XEM) amount equivalent to
anonymity provided is very high. Figure 1 shows a summary in
about 58 billion yen ($ 500 million) was misappropriated. After
an access course in Tor.
that, the person believed to be the perpetrator installed a virtual
currency trading site in the Dark Web and used it for the
purpose of exchanging NEM with other virtual currencies.
Eventually, all the misappropriated cryptocurrency was traded
away. Figure 2 shows a screenshot from a website established by
a person who appears to be a criminal.
Any research team breaking these guidelines, even if they are The next incident relevant to the Dark Web that we will explore
engaged in classification research related to malicious occurred overseas and involved 4iQ security company [14]
anonymous services or simply collecting Onion addresses, faces investigators. Those investigators reported finding files
permanent banning from the Tor relay network [8]. In containing 41 gigabytes of data, including 1.4 billion username
consideration of those guidelines, our research refrained from and password sets [15]. Since those passwords were stored in
using HSDir or exit nodes. Instead of collecting massive Onion
54
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
plaintext, the accounts they were supposed to protect could have However, our study is based solely on the usage frequency of
been easily exploited if that information had gotten into predefined keywords.
malicious hands.
5 DARK WEB ANALYSIS
4 RELATED WORK This section explains the purpose behind our proposed technique
In a study aimed at elucidating illegal content on the Dark Web, for analyzing the Dark Web. In this research, in addition to the
Zulkarnine et al [16]. reported on the development of a system illegal sites that are presumed to be on the Dark Web, we
called Web Crawler that automatically crawls websites on the attempted to classify all of the detected sites into rough
Internet based on predefined keywords and its application to the categories, including sites that are not illegal, that can often be
Dark Web. Their investigations uncovered numerous illegal seen when browsing the Dark Web.
websites linked to extremists and terrorist groups, and showed
Then, based on how the categorized categories are
their mutual interconnections. In investigations based on
interconnected, we attempt to discover previously unknown
popular keywords, it was found possible to map the Dark Web to
Onion domains by analyzing the detected links, and classify the
a network graph, compare the scores of incoming and outgoing
detected websites into categories without actually accessing the
degrees, and discover the popular websites at the center. One
newly detected Tor domain. We will begin by providing a
manual investigation of popular Dark Web websites uncovered
detailed, step-by-step overview of the procedure for analyzing
links to supplying the illegal Russian drug market and
Dark Web content, after which we will show the results.
documents for connecting to the Tor network. There were also
political blogs and others that assert to be at top-ranking 1 Numerous Onion domains are acquired.
positions in anti-US government movements. That investigation
1.1 “http” acquires an effective Onion domain name
differed from this research in the manner in which websites
using Ichidan.
were classified and contrasted. More specifically, that study
aimed at the early detection and monitoring of radical group 1.2 The domain name added by List within one year
websites and groups espousing anti-American positions, while using Fresh Onions is acquired. As opposed to all
this research aims primarily at roughly classifying websites the Onion domains that carried out.
currently existing in the Dark Web into six different categories.
Furthermore, even though the network graph visualization 2 All acquired Onion domains are scraped and then the top
method used is the same as that employed in the page is downloaded.
abovementioned research, this study demonstrates how colors 3 Using the text in the top page, data are roughly classified
can be used to define each category in a node in a way that into six categories. Note that it is not possible to classify
allows connections for each content type to be more easily new domains at this point.
grasped, clarified, and presented.
4 A directed graph is created from the hyperlink in the page.
Ono et al. [17] proposed a Dark Web analysis system that uses
HSDir Snooping to gather Onion addresses. More specifically, 5 Category classification results are reflected in the created
when a user connects to an Onion domain using an anonymizing directed graph.
service, it needs access to HSDir to obtain information such as
introduction points. At that time, it becomes possible to collect 5.1 Onion domain acquisition
information on the Onion domains that a client actually accesses The search method assigns a protocol name, for example, to the
by installing an HSDir to conduct observations. Using this query part. In this case, “http://ichidanv34wrx7m7.onion?query=
structure, information on the server that is carrying out Dark http” will be used in order to acquire an Onion domain where H
Web hosting can be collected by performing a service scan to the TTP is valid. Next, a list of Onion domains for which HTTP is val
collected Onion domains. From the results obtained via such id is enumerated. Note that, as of this study, Ichidan is offline an
scanning, it became clear that many servers that exhibit service d cannot be accessed. If access is performed with “http://zlal32te
in unsuitable states exist. When classifying Onion domains into yptf4tvi.onion/json/all”, then JavaScript Object Notation (JSON),
three categories due to the anonymity of the service, the most a domain name, all the indicated titles, added days, etc., will be d
anonymous group was given the highest ranking. In addition, ownloaded and displayed.
they manually accessed the top 15 Onion addresses with
numerous queries and confirmed their content. Those were 5.2 Scraping Onion domains
found to be black market sites dealing with illegal drugs. The When Tor is executed, port 9050 of localhost is in the Listen
research described in this paper differs from the one described state. Then, by setting the proxy, it becomes possible to connect
above in that we do not install observation equipment such as from the program to the Dark Web. A summary of the data
honeypots into a Tor network, but only open data performing acquired during this study is shown in Tables 1-1 and 1-2.
all, and do not engage in active scanning of the Dark Web.
Furthermore, in the abovementioned study, the frequency of
access attempts is used as a method of finding illegal sites.
55
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
56
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
which refers to sites carrying news articles and political 5.5 Reflecting category classification results
commentary. The results that were divided into the six categories mentioned
in Sec 5.3 and then reflected in the graph. Then colors were
applied for each category to make it visually easy to understand
the created graphs. Since the entire created graph size is large, a
portion was enlarged and shown in Figure 6 and Figure7.
In this way, the edges of the Dark Web could be expressed as a
network graph and the relationship for every Dark Web
category that includes one of the performance goals of this
research could be visualized. Moreover, it appears that our
method can also be used to discover new Onion addresses,
deduce their content, etc.
57
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
58
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA
Degree centeredness is an index by which a node with numerous implementation applied to our network graph the relationships
edges can be thoroughly evaluated, whereas mediation and features of fellow travelers on the Dark Web could be
centeredness is an index by which a node that connects a certain clarified. Moreover, as a result of analyzing our network graph, a
cluster, and the cluster itself, can be thoroughly evaluated. Dark Web version of an SNS and a market system for illicit
Eigenvector centeredness is an index by which other central products could be discovered.
high nodes can be thoroughly evaluated.
As an example of how to exclude domain lists and search REFERENCES
engines when targeting websites with one to 10 links, we show [1] Bergman Michael K, “The Deep Web: Surfacing Hidden Value”, Journal of
Electronic Publishing 2001, Volume 7, Issue 1
the top three cases in Tables 3 to 5. The results show that order [2] Nikkei TECH, “Dark
centrality still identified a relatively large number of websites Web”,http://tech.nikkeibp.co.jp/atcl/nxt/column/18/00178/040600011/
that introduced Onion domains. What is noteworthy in terms of [3] ASCII, “The secret of an anonymity communications system Tor that police
information was made to reveal”, http://ascii.jp/elem/000/000/588/588241/
uncovered malevolence is that the Dark Web “Atlayo” SNS [4] Tokyo paper, “Dark site incident 10 years criminal information net deeply”,
website, which is on the top level, provides bridges between http://www.tokyo-
np.co.jp/article/national/list/201708/CK2017082402000235.html
different large groups. Moreover, it also became clear that [5] Tor Metrics, “Servers - Tor Metrics”,
“Hidden Answers” performed a similar community site function. https://metrics.torproject.org/networksize.html
Using eigenvector centeredness, it was possible to locate the “X- [6] Tor Blog, “Ethical Tor Research: Guidelines”,
https://blog.torproject.org/ethical-tor-research-guidelines
Market” site on Dark Web, and it could be determined that it was [7] THE ZERO / ONE, “Tor HSDir which spies a dark web is found 100 or ore.”,
an accessible market system that was actually written in Russian. https://the01.jp/p0002855/
[8] THE ZERO / ONE, “The University of Sao Paulo looked into the dark web?
Table 3-1: Degree Centrality Exhausted from Tor Relay Management”,
https://the01.jp/p0005695/
Title URL [9] Ichidan, http://ichidanv34wrx7m7.onion/
Daniel's [10] Fresh Onions, http://zlal32teyptf4tvi.onion/
http://dhosting4okcs22v.onion [11] Shodan, https://shodan.io/
Hosting [12] Information Security, “Inappropriate information disclosure and
Onion Wiki http://wikiti3e4q2ca2e7.onion countermeasures for increasing Internet access equipment”,
https://www.ipa.go.jp/files/000052712.pdf
Piwiki http://statssizc4e5rtnk.onion [13] freshonions-torscraper, https://github.com/dirtyfilthy/freshonions-
torscraper
[14] Identity-focused Cyber Intelligence, https://4iq.com/
Table 3-2: Betweenness Centrality [15] Forbes Japan, “1.4 billion private data leaked to Dark Web, well-known porn
Website URL site”, https://forbesjapan.com/articles/detail/18912
[16] Ahmed T. Zulkarnine, Richard Frank, and Bryan Monk, “Surfacing
Daniel - Home http://tt3j2x4k5ycaa5zt.onion collaborated networks in dark web to find illicit and criminal content”,
Hidden Intelligence and Security Informatics, 2016
http://answerstltvcsbgh.onion [17] Ono Akito, Kamizono Masaki, Kasama Takahiro, and Uehara Tetsutaro,
Answers “Darkweb Analysis System Combining HSDir's Snooping and Active
Atlayo http://atlayofke5rqhsma.onion Scanning”, Information and Communication System Security, 2018-02
7 CONCLUSIONS
In this paper, we proposed an analysis method that targets
classifications and relationships of Dark Web content on a Tor
network. And we implemented a viewer of the visualized graph
node and executed in order to obtain from evaluations were
obtained from the actual proof experiments. Specifically, 12,542
Tor domains were collected using Ichidan and Fresh Onions and
categorized from top page text sentences using a naive Bayes
algorithm. Moreover, the relationships for every category were
visualized from their hyperlinks.
To facilitate visualization and we realized to make it easier to
understand how various categories connect with each other,
different colors were given to different nodes, including new
domains as well as the classified categories. When our
59