You are on page 1of 7

Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

Dark Web Content Analysis and Visualization

Sugiu Takaaki Inomata Atsuo


Information Security Lab Information Security Lab
Tokyo Denki University Tokyo Denki University
Senju-Asahi-cho Adachi-ku Japan Senju-Asahi-cho Adachi-ku Japan
sugiu@isl.im.dendai.ac.jp inomata@mail.dendai.ac.jp

ACM Reference format:


ABSTRACT Sugiu Takaaki, Inomata Atsuo. 2019. Dark Web Content Analysis and
The Dark Web, which is a vast array of encrypted online content Visualization. In fifth ACM International Workshop on Security and
and websites that can be only be accessed by the use of Privacy Analytics (IWSPA’19), March 27, 2019, Richardson, TX, USA,
anonymizing tools such as The Onion Router (Tor), is currently a ACM, New York, NY, USA, 7 pages,
topic of serious concern because it has become a growth area for https://doi.org/10.1145/3309182.3309189
clandestine entities catering to forbidden activities and services
such as illegal drugs and weapons, child pornography, sensitive
information such as stolen credit card information, distributed
1 INTRODUCTION
denial of services (DDoS) tools such as Booter, and other The information that is collected by normal search engines
antisocial activities. As a result, it is now being investigated by operating on the World Wide Web (WWW) is called a “Surface
researchers, law enforcement agencies, security companies, and Web” search result, and most blogs, news sites, and social
academic institutions around the world. However, there is not an networking services (SNS) sites will be identified during those
efficient investigation method and it takes a lot of time. In an searches. Surface websites are typically registered with search
effort to gain a better understanding of the Dark Web, we engines by applying hyperlinks or submitting applications.
analyzed a large amount of Onion domain name data obtained However, there is another level to the WWW that is filled with
from using the “Ichidan” search engine and the “Fresh Onions” information that cannot be found using Surface Web search
open source Tor. More specifically, we collected everything we engines that is called the “Deep Web”. This refers to, for
could gather into an acquired Onion domain and then example, websites that normal search engines cannot reach
downloaded the top page. The Onion domain was then classified because they are not listed in normal search engines and must be
into six categories, including classification not possible, and a accessed by direct contact with a Universal Resource Locator
new domain was created from the downloaded top page text. (URL) or Transmission Control Protocol/Internet Protocol
Additionally, we implemented a simulator that is a directed (TCP/IP) address, which may require passwords and/or
graph created from the uncovered hyperlinks and connection satisfying other security measures to gain access past the top
states to the Dark Web were achieved with simplicity. We then website page. A survey performed by the University of
attempted to determine the relationships and characteristics of California [1] found that there is 400 to 550 times more
each instance Dark Web content by reflecting on the graphed information on the Deep Web than is available as public
classification results. information on the Surface Web.
Lurking within the Deep Web there yet another shadowy
KEYWORDS Internet space called the “Dark Web” that cannot be reached by
Dark Web, Tor, Hidden Service, Network Graph, Visualization search engines and can only be accessed or browsed using
anonymized networks such as The Onion Router (Tor), The
Invisible Internet Project (IP2), Freenet, and dedicated
applications corresponding to those networks [2]. Data handled
Permission to make digital or hard copies of all or part of this work for personal by anonymized networks such as Tor pass through multiple
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice relay agents and do not leave log traces in any of the servers that
and the full citation on the first page. Copyrights for components of this work they pass through. In addition, all data, except the exit port on
owned by others than ACM must be honored. Abstracting with credit is
permitted. To copy otherwise, or republish, to post on servers or to redistribute the access route, are encrypted. Furthermore, to make tracking
to lists, requires prior specific permission and/or a fee. even more difficult, routing is changed at regular intervals in Tor
Request permissions from Permissions@acm.org. [3], thus resulting in a highly anonymous network. The
IWSPA '19, March 27, 2019, Richardson, TX, USA
© 2019 Association for Computing Machinery. anonymity made possible by the Dark Web has made it
ACM ISBN 978-1-4503-6178-1/19/03…$15.00 attractive to active sites where trading in illegal drugs and
https://doi.org/10.1145/3309182.3309189
weapons, child pornography, sensitive data such as stolen credit

53
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

card information, and the like, are rampant. It also provides domains, we performed our analyses using the “Ichidan” [9]
cover for DDoS tools such as Booter and other forms of search engine and the “Fresh Onions” [10] open source Tor.
cyberattack [4]. However, if all of the illegal sites on the Dark Hence, we can assert beforehand that our actions did not violate
Web could be refused beforehand, it would not be a dangerous the ethical guidelines provided above.
place. Like the Surface Web, it is primarily an environment for
Ichidan is a search engine similar to Shodan [11] for the Dark
simple blogs, community sites, forums, and so on.
Web. Shodan is a website that collects and publishes device
information connected to the Internet by conducting port scans
2 ABOUT TOR and banner surveys [12]. Fresh Onions, which is Tor’s hidden
Tor is the onion routing anonymization network devised by service crawler, is an open source project published on GitHub,
employees at the US Naval Research Laboratory (NRL). It is which is a web-based hosting service for version control using
implemented by combining peer-to-peer (P2P) technology, which Git [13]. Git is a version-control system for tracking changes in
is a network in which the client communicates in a one-to-one computer files and coordinating work among multiple files and
relationship, with the SOCKS technology in relaying TCP/IP people.
communications. Tor’s relay nodes are spread all over the world
and the network currently accesses more than 6,000 servers [5].
3 DARK WEB INCIDENTS
Since those relay nodes do not keep log information and node
In this section, we will begin by discussing a coin check incident
connections are changed at fixed times, and since all
relevant to the Dark Web that occurred in Japan. Specifically, the
communications other than the exit data are encrypted, the
coin check site was accessed illegally around the end of January
communication source becomes undetermined and the level of
2018, and a NEM cryptocurrency (XEM) amount equivalent to
anonymity provided is very high. Figure 1 shows a summary in
about 58 billion yen ($ 500 million) was misappropriated. After
an access course in Tor.
that, the person believed to be the perpetrator installed a virtual
currency trading site in the Dark Web and used it for the
purpose of exchanging NEM with other virtual currencies.
Eventually, all the misappropriated cryptocurrency was traded
away. Figure 2 shows a screenshot from a website established by
a person who appears to be a criminal.

Figure 1: Access course in Tor

2.2 TOR ethics guideline


The Tor Project has an ethics guideline [6] that forbids some
actions. Examples of unacceptable research activity include:

1. It is forbidden to run a hidden service directory (HSDir),


harvest onion addresses, and publish or connect to those
onion addresses [7].
2. It is forbidden to set up relays to sniff or tamper with exit
traffic. While some broad measurements may be acceptable
depending on risk/benefit trade-offs, fine-grained measures
are not allowed [7].
3. It is forbidden to set up relays that are deliberately Figure 2: Website set up by a person believed to be a
dysfunctional. criminal

Any research team breaking these guidelines, even if they are The next incident relevant to the Dark Web that we will explore
engaged in classification research related to malicious occurred overseas and involved 4iQ security company [14]
anonymous services or simply collecting Onion addresses, faces investigators. Those investigators reported finding files
permanent banning from the Tor relay network [8]. In containing 41 gigabytes of data, including 1.4 billion username
consideration of those guidelines, our research refrained from and password sets [15]. Since those passwords were stored in
using HSDir or exit nodes. Instead of collecting massive Onion

54
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

plaintext, the accounts they were supposed to protect could have However, our study is based solely on the usage frequency of
been easily exploited if that information had gotten into predefined keywords.
malicious hands.
5 DARK WEB ANALYSIS
4 RELATED WORK This section explains the purpose behind our proposed technique
In a study aimed at elucidating illegal content on the Dark Web, for analyzing the Dark Web. In this research, in addition to the
Zulkarnine et al [16]. reported on the development of a system illegal sites that are presumed to be on the Dark Web, we
called Web Crawler that automatically crawls websites on the attempted to classify all of the detected sites into rough
Internet based on predefined keywords and its application to the categories, including sites that are not illegal, that can often be
Dark Web. Their investigations uncovered numerous illegal seen when browsing the Dark Web.
websites linked to extremists and terrorist groups, and showed
Then, based on how the categorized categories are
their mutual interconnections. In investigations based on
interconnected, we attempt to discover previously unknown
popular keywords, it was found possible to map the Dark Web to
Onion domains by analyzing the detected links, and classify the
a network graph, compare the scores of incoming and outgoing
detected websites into categories without actually accessing the
degrees, and discover the popular websites at the center. One
newly detected Tor domain. We will begin by providing a
manual investigation of popular Dark Web websites uncovered
detailed, step-by-step overview of the procedure for analyzing
links to supplying the illegal Russian drug market and
Dark Web content, after which we will show the results.
documents for connecting to the Tor network. There were also
political blogs and others that assert to be at top-ranking 1 Numerous Onion domains are acquired.
positions in anti-US government movements. That investigation
1.1 “http” acquires an effective Onion domain name
differed from this research in the manner in which websites
using Ichidan.
were classified and contrasted. More specifically, that study
aimed at the early detection and monitoring of radical group 1.2 The domain name added by List within one year
websites and groups espousing anti-American positions, while using Fresh Onions is acquired. As opposed to all
this research aims primarily at roughly classifying websites the Onion domains that carried out.
currently existing in the Dark Web into six different categories.
Furthermore, even though the network graph visualization 2 All acquired Onion domains are scraped and then the top
method used is the same as that employed in the page is downloaded.
abovementioned research, this study demonstrates how colors 3 Using the text in the top page, data are roughly classified
can be used to define each category in a node in a way that into six categories. Note that it is not possible to classify
allows connections for each content type to be more easily new domains at this point.
grasped, clarified, and presented.
4 A directed graph is created from the hyperlink in the page.
Ono et al. [17] proposed a Dark Web analysis system that uses
HSDir Snooping to gather Onion addresses. More specifically, 5 Category classification results are reflected in the created
when a user connects to an Onion domain using an anonymizing directed graph.
service, it needs access to HSDir to obtain information such as
introduction points. At that time, it becomes possible to collect 5.1 Onion domain acquisition
information on the Onion domains that a client actually accesses The search method assigns a protocol name, for example, to the
by installing an HSDir to conduct observations. Using this query part. In this case, “http://ichidanv34wrx7m7.onion?query=
structure, information on the server that is carrying out Dark http” will be used in order to acquire an Onion domain where H
Web hosting can be collected by performing a service scan to the TTP is valid. Next, a list of Onion domains for which HTTP is val
collected Onion domains. From the results obtained via such id is enumerated. Note that, as of this study, Ichidan is offline an
scanning, it became clear that many servers that exhibit service d cannot be accessed. If access is performed with “http://zlal32te
in unsuitable states exist. When classifying Onion domains into yptf4tvi.onion/json/all”, then JavaScript Object Notation (JSON),
three categories due to the anonymity of the service, the most a domain name, all the indicated titles, added days, etc., will be d
anonymous group was given the highest ranking. In addition, ownloaded and displayed.
they manually accessed the top 15 Onion addresses with
numerous queries and confirmed their content. Those were 5.2 Scraping Onion domains
found to be black market sites dealing with illegal drugs. The When Tor is executed, port 9050 of localhost is in the Listen
research described in this paper differs from the one described state. Then, by setting the proxy, it becomes possible to connect
above in that we do not install observation equipment such as from the program to the Dark Web. A summary of the data
honeypots into a Tor network, but only open data performing acquired during this study is shown in Tables 1-1 and 1-2.
all, and do not engage in active scanning of the Dark Web.
Furthermore, in the abovementioned study, the frequency of
access attempts is used as a method of finding illegal sites.

55
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

Table 1-1: Summary of acquired data (Ichidan) 𝑃 𝑤𝑜𝑟𝑑 |𝑑𝑜𝑐 (3)


Number of total websites acquired 8,291 𝑇⎛𝑐𝑎𝑡, 𝑤𝑜𝑟𝑑𝑖 ) + 1
Number of reachable websites 5,375 = ⎝
2018/01/30~ ∑𝑤𝑜𝑟𝑑′ ∈𝑉 ⎛𝑇⎛𝑐𝑎𝑡, 𝑤𝑜𝑟𝑑′ ) + 1)
Collection period ⎝ ⎝
2018/02/04 𝑇⎛𝑐𝑎𝑡, 𝑤𝑜𝑟𝑑𝑖 ) + 1
= ⎝
Table 1-2: Summary of acquired data (Fresh Onions)
⎛∑ ′ 𝑇⎛𝑐𝑎𝑡, 𝑤𝑜𝑟𝑑′ )) + |𝑉|
⎝ 𝑤𝑜𝑟𝑑 ∈𝑉 ⎝
Number of total websites acquired 18,957
Number of reachable websites 7,167 In this case, the test data always belongs to the teacher data
2018/06/13~ category. However, since the Dark Web has a variety of
Collection period categories, Formula (3) is unsuitable. Therefore, when a zero-
2018/07/10
frequency problem occurs, classification is performed without
applying Laplace smoothing.
5.3 Preliminary category classifications An example of the teacher data is shown in Table 2. When
Naive Bayes is a commonly used method when categorizing Web defining words, those belonging to multiple categories such as
sites. This algorithm is used for website category categorization, “btc” are excluded, and the number of teacher data is set to 200
of Web sites, as well as document classification and spam email or more. This allows categories to be described more easily.
filtering of spam mails. The posterior probability of which is the
Table 2: Example keywords and their categories
category cat given when the document doc is given is
𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 , which satisfies the following formula (1) . It can hacking drug develo porn new casino
express as follows. p s
RATs drug debian sex terro casino
r
𝑃 𝑐𝑎𝑡 𝑃 𝑑𝑜𝑐|𝑐𝑎𝑡 (1)
𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 = backdoor marihua apt sexua syria roulette
𝑃 𝑑𝑜𝑐
na l
∝ 𝑃⎛𝑐𝑎𝑡)𝑃⎛𝑑𝑜𝑐|𝑐𝑎𝑡) malware cannabi ubuntu teen chin slots
⎝ ⎝
s a
The document doc can be expressed as a collection of words with DDoS narcotic progra girl legal poker
bag-of-words, and if it is assumed that the words are m
independent, 𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 it can be calculated as shown in booter meth package porn news blackjac
Formula (2) follows. k
hacker herb oss adult blog baccarat
𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 (2) zero-day mema develop erotic rss trump
threat gram centos henta chat craps
= 𝑃⎛𝑤𝑜𝑟𝑑1 ∧ … ∧ 𝑤𝑜𝑟𝑑𝑘 |𝑐𝑎𝑡)
⎝ i
= ∏ 𝑃⎛𝑤𝑜𝑟𝑑𝑘 |𝑐𝑎𝑡) cracking pill python asian form game
𝑖 ⎝

Moreover When 𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 becomes ais very small number,  hacking:


there is a possibility of causing an underflow. Therefore, a Sites dealing with Booter and vulnerability
logarithm is taken so that and multiplications become is added information
up additions. In addition, if there is no word in the original  drug:
training data as it is, 𝑃 𝑐𝑎𝑡|𝑑𝑜𝑐 is becomes 0 and calculation Sites dealing with cannabis, stimulants, and other
illegal intoxicants
can not be performed. This is called the zero-frequency problem.
 develop:
Then In such cases, Laplace Smoothing, which adds 1 to the
Sites dealing with IT technology
number of times a word is mentioned, is often used as the  porn:
solution technique often used, Laplace Smoothing which adds 1 Adult sites
to the number of times of an appearance of a word is mentioned.  news:
Therefore, the general naive Bayes is shown as Formula (3) News, political commentary, etc.
follows.  casino:
Online casinos

The Naive Bayes classification result is shown in Figure 3. Here,


it can be seen that about half of the total is classifiable. The
smallest is “develop” which comprises package information and
references. The most commonly observed category was news,

56
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

which refers to sites carrying news articles and political 5.5 Reflecting category classification results
commentary. The results that were divided into the six categories mentioned
in Sec 5.3 and then reflected in the graph. Then colors were
applied for each category to make it visually easy to understand
the created graphs. Since the entire created graph size is large, a
portion was enlarged and shown in Figure 6 and Figure7.
In this way, the edges of the Dark Web could be expressed as a
network graph and the relationship for every Dark Web
category that includes one of the performance goals of this
research could be visualized. Moreover, it appears that our
method can also be used to discover new Onion addresses,
deduce their content, etc.

Figure 3: Classification results

5.4 Directed graph created from hyperlink


We analyzed how much hyperlinks are on the downloaded
onion webpage. The analysis results are shown in Figure 4. Of
the hyperlinks analyzed, only 4297 of 12,542 Tor domains had an
address with a link (excluding the domain), whereas 66% of Tor
domains did not have a link. Later, about 95% of address will be
less than 100 cases of link count.

Figure 6: Category directed graph (enlarged)

Figure 4: Hyperlink analysis result


Figure 5 is a directed graph showing the directionality to the
domain that links a node with an Onion domain.

Figure 7: Category directed graph (more enlarged)

Figure 5: Directed graph

57
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

6 DISCUSSION population centered on drugs is shown in Figure 8. Although


there are some nodes that show different categories, it is
6.1 Classification results
understood that many are drug nodes and that they are all
closely tied together.
Although we were able to classify more than half of the cases,
there is still significant room for improvement. There are two
possible causes for many of the remaining problems, which will
be discussed below.
First, it must be noted that all of the defined keywords used in
this study are English, even though numerous languages such as
Russian, Chinese, French, and Japanese are commonly used on
the Dark Web. Therefore, in order to classify websites where a
language other than English is used, it will be necessary to define
keywords that will work in multiple languages. This can be
expected to be quite difficult.
The second point is that some keywords could not be classified
into appropriate categories, because their uses in those
categories are rare. When defining a category, 300 sites were
perused manually at random, and characteristic keywords were
extracted for each. Simultaneously, keywords considered most
likely to be contained in particular categories were considered
and enumerated. Ultimately, each category was defined by ten or Figure 8: Examples of groups of similar categories.
more keywords. However, slang-like expressions exist and are
Figure 9 shows how much the link destination category matches
used in numerous categories. For example, in the case of medical
the link destination category provided here. It can be said that
references, it is often to enumerate them all when considering
there is a high possibility that the same categories are linked to
the “drug” category.
each other in the drug and news categories. On the other hand, it
can be said that links between the casino and hacking categories
6.2 Content connection are relatively weak.
The graph created in this study has several features. Two at the
time of paying one's attention to the case where its attention is
paid to the degree of a node, and a category are described. There
are more than 400 small groups with fewer than 10 edge
connections on the outer periphery of the graph. There are also
several small groups near the center of the graph in which one
or more of the nodes have connections with at least one other
group. Therefore, it is considered likely that the groups that exist
on the outer periphery of the circle have few connections with
other Onion domains because the number of accesses is limited,
and the scale of those sites is small.
Next, when considering the small groups near the center of the
graph, we find that the nodes connecting the groups are likely to
be blogs that transmit community sites and information. The Tor Figure 9: Category matching rate of link source and link
network previously had several search engines like Google and destination
Yahoo, but many of those are now closed. Therefore, it is
speculated that patterns where link collections and information 6.3 Network analysis
sites are identifiable by their website address are still in use, just In network analysis, computing centeredness provides an
as it was in the early days of the Internet. important index, especially when attempting to comprehend the
Finally, we will consider the presence of nodes with more than outline of an important node or a graph. Although there are
1,000 orders. There are 16 such nodes and it is clearly abnormal. various kinds of centeredness, in order to discover popular
In addition, there is a single node with an abnormal degree of websites and website that have nodes and connections with
21,880 orders. After manually accessing the 16 Onion addresses other forms of high centeredness, degree centeredness,
with abnormal orders, we determined that they consisted of a mediation centeredness, and eigenvector centeredness are
search engine and a chat site. It was also found that the same computed.
categories formed a hardened group. As one example, a

58
Session: Security Analytics, Dark Web IWSPA ’19, March 27, 2019, Richardson, TX, USA

Degree centeredness is an index by which a node with numerous implementation applied to our network graph the relationships
edges can be thoroughly evaluated, whereas mediation and features of fellow travelers on the Dark Web could be
centeredness is an index by which a node that connects a certain clarified. Moreover, as a result of analyzing our network graph, a
cluster, and the cluster itself, can be thoroughly evaluated. Dark Web version of an SNS and a market system for illicit
Eigenvector centeredness is an index by which other central products could be discovered.
high nodes can be thoroughly evaluated.
As an example of how to exclude domain lists and search REFERENCES
engines when targeting websites with one to 10 links, we show [1] Bergman Michael K, “The Deep Web: Surfacing Hidden Value”, Journal of
Electronic Publishing 2001, Volume 7, Issue 1
the top three cases in Tables 3 to 5. The results show that order [2] Nikkei TECH, “Dark
centrality still identified a relatively large number of websites Web”,http://tech.nikkeibp.co.jp/atcl/nxt/column/18/00178/040600011/
that introduced Onion domains. What is noteworthy in terms of [3] ASCII, “The secret of an anonymity communications system Tor that police
information was made to reveal”, http://ascii.jp/elem/000/000/588/588241/
uncovered malevolence is that the Dark Web “Atlayo” SNS [4] Tokyo paper, “Dark site incident 10 years criminal information net deeply”,
website, which is on the top level, provides bridges between http://www.tokyo-
np.co.jp/article/national/list/201708/CK2017082402000235.html
different large groups. Moreover, it also became clear that [5] Tor Metrics, “Servers - Tor Metrics”,
“Hidden Answers” performed a similar community site function. https://metrics.torproject.org/networksize.html
Using eigenvector centeredness, it was possible to locate the “X- [6] Tor Blog, “Ethical Tor Research: Guidelines”,
https://blog.torproject.org/ethical-tor-research-guidelines
Market” site on Dark Web, and it could be determined that it was [7] THE ZERO / ONE, “Tor HSDir which spies a dark web is found 100 or ore.”,
an accessible market system that was actually written in Russian. https://the01.jp/p0002855/
[8] THE ZERO / ONE, “The University of Sao Paulo looked into the dark web?
Table 3-1: Degree Centrality Exhausted from Tor Relay Management”,
https://the01.jp/p0005695/
Title URL [9] Ichidan, http://ichidanv34wrx7m7.onion/
Daniel's [10] Fresh Onions, http://zlal32teyptf4tvi.onion/
http://dhosting4okcs22v.onion [11] Shodan, https://shodan.io/
Hosting [12] Information Security, “Inappropriate information disclosure and
Onion Wiki http://wikiti3e4q2ca2e7.onion countermeasures for increasing Internet access equipment”,
https://www.ipa.go.jp/files/000052712.pdf
Piwiki http://statssizc4e5rtnk.onion [13] freshonions-torscraper, https://github.com/dirtyfilthy/freshonions-
torscraper
[14] Identity-focused Cyber Intelligence, https://4iq.com/
Table 3-2: Betweenness Centrality [15] Forbes Japan, “1.4 billion private data leaked to Dark Web, well-known porn
Website URL site”, https://forbesjapan.com/articles/detail/18912
[16] Ahmed T. Zulkarnine, Richard Frank, and Bryan Monk, “Surfacing
Daniel - Home http://tt3j2x4k5ycaa5zt.onion collaborated networks in dark web to find illicit and criminal content”,
Hidden Intelligence and Security Informatics, 2016
http://answerstltvcsbgh.onion [17] Ono Akito, Kamizono Masaki, Kasama Takahiro, and Uehara Tetsutaro,
Answers “Darkweb Analysis System Combining HSDir's Snooping and Active
Atlayo http://atlayofke5rqhsma.onion Scanning”, Information and Communication System Security, 2018-02

Table 3-3: Eigenvector Centrality


Website URL
X-Market http://xmarket7sw2fba6b.onion
Схоронил! http://magn37z4yo7mhylp.onion
MagusNet Info
http://tghtnwsywvvhromy.onion.html
and Stuff

7 CONCLUSIONS
In this paper, we proposed an analysis method that targets
classifications and relationships of Dark Web content on a Tor
network. And we implemented a viewer of the visualized graph
node and executed in order to obtain from evaluations were
obtained from the actual proof experiments. Specifically, 12,542
Tor domains were collected using Ichidan and Fresh Onions and
categorized from top page text sentences using a naive Bayes
algorithm. Moreover, the relationships for every category were
visualized from their hyperlinks.
To facilitate visualization and we realized to make it easier to
understand how various categories connect with each other,
different colors were given to different nodes, including new
domains as well as the classified categories. When our

59

You might also like