Professional Documents
Culture Documents
Pavlos Evangelatos, Mathijs Kappetijn, Michiel de Ruiter, Martin Schvarcbacher and Juan Agustin Tibaldo
1 I NTRODUCTION
Social networks have went through a large surge in popularity since done by looking at what are the most referenced communities from
2008, from when they saw exponential growth [4]. These social net- another one, measuring the volume of incoming and outgoing posts
works can provide a vast amount of data about their users and the along with their attributes. The users can also observe changes over
communication patterns they display. Analyzing this data can require time in these communities, taking advantage of the temporal nature
using specialized data mining and visualization tools. In this paper, of the dataset. Lastly, the users can select one or more communities
we present a method of visually analyzing temporal graph data from and compare them on several aspects, such as their readability, ratio
one popular social network and show how this can be generalized to of incoming to outgoing posts and overall sentiment. This provides a
other similar networks. comparison of how different subreddits are perceived by other com-
We focus on analyzing the SNAP dataset of Reddit [3], which con- munities.
tains 2 and half years of user generated data collected from Reddit. Our tool aims to enable the visual exploration of the different re-
Reddit is an open communication and link sharing platform, where lationships between multiple Reddit communities and see their inter-
users can share links to content located on the internet, including on actions. Additionally, we enable the comparison of multiple of these
different parts of Reddit itself, which are called subreddits. The dataset observed communities based on several pre-defined features to find
we use focuses on cataloging posts which link from Reddit back to any similarities between them.
Reddit itself and adds a common set of attributes to this data. This The main contributions of our paper are:
includes the overall sentiment of the source post, where users voted on
whether the given post is positive or negative. Furthermore, each post • We present how to visualized sentiment data for temporal di-
has a Linguistic Inquiry and Word Count (LIWC) score [6], which can rected graphs and how to use this for determining the relation-
be used to classify the post’s content. This dataset is temporal and ships between different communities.
represents a directed graph.
Our visualization tool focuses on showing the relationships between • We present a way to compare features of Reddit communities
different Reddit communities by analyzing their interactions. This is and how they evolve over time.
4 R ESULTS
Fig. 2. Timeline showing the range of the data to be analyzed Our tool was tested within the team of the paper’s authors. The tool
can be used to find how communities interact and what are their dif-
ferences. For example, in Figure 1, we can see that community ’sub-
Network view The network view is the main visualization screen redditdrama’ interacts with a lot of other subreddits, mainly in one di-
displayed by Figure 3, where we display the relationships between the rection. This could be because ’subredditdrama’ mostly links to other
dits and directly compare them to gain further insight. Our paper made
the following contributions: we present a way to explore relationships
in temporal graph data and present ways this can be extended to other
data sets outside of Reddit.
R EFERENCES
[1] M. Farrugia and A. Quigley. Tgd: Visual data exploration of temporal
graph data. volume 7243, page 724309, 01 2009.
[2] N. Kerracher, J. Kennedy, and K. Chalmers. The design space of temporal
graph visualisation. In EuroVis (Short Papers), 2014.
[3] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large net-
work dataset collection. http://snap.stanford.edu/data/
soc-RedditHyperlinks.html, June 2014.
[4] E. Ortiz-Ospina. The rise of social media. https://
ourworldindata.org/rise-of-social-media. Accessed:
2020-03-15.
[5] A. Park, M. Conway, and A. T. Chen. Examining thematic similarity, dif-
ference, and membership in three online mental health communities from
reddit: a text mining and visualization approach. Computers in human
behavior, 78:98–112, 2018.
[6] J. W. Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry
and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates,
71(2001):2001, 2001.
subreddits. The color of the line also indicates that the sentiment is
on the negative side. This can be explained because many posts from
’subredditdrama’ links to other subreddits in time of drama. On further
comparison using the radar view, we can see that indeed the outgoing
volume is high and the sentiment is below average.
This visualization approach can be modified to suit other data which
is both temporal and graph based with multiple attributes. The time-
line navigation can support any temporal data and be the key for the
navigation and displaying changes over time. The network view can be
adapted to display different attributes between nodes using line thick-
ness and colors. Lastly, the popup radar view can display any ordinal
values. These values can be converted to a ratio for an easy compari-
son.
5 C ONCLUSION
We have created a tool which can be used to explore the different rela-
tionships within the Reddit communities over time. The users are able
to visualize the amount of total volume of posts overall for a given
time frame and then explore these relationships as they evolved over
time or view them all on an aggregated level. Furthermore, the user
can select to view additional information about the observed subred-