You are on page 1of 3

Visualizing Temporal Sub-Reddit Relationships

Pavlos Evangelatos, Mathijs Kappetijn, Michiel de Ruiter, Martin Schvarcbacher and Juan Agustin Tibaldo

Fig. 1. Main view of our project showing all the design

Index Terms—Reddit, subreddit relationships, visualization

1 I NTRODUCTION
Social networks have went through a large surge in popularity since done by looking at what are the most referenced communities from
2008, from when they saw exponential growth [4]. These social net- another one, measuring the volume of incoming and outgoing posts
works can provide a vast amount of data about their users and the along with their attributes. The users can also observe changes over
communication patterns they display. Analyzing this data can require time in these communities, taking advantage of the temporal nature
using specialized data mining and visualization tools. In this paper, of the dataset. Lastly, the users can select one or more communities
we present a method of visually analyzing temporal graph data from and compare them on several aspects, such as their readability, ratio
one popular social network and show how this can be generalized to of incoming to outgoing posts and overall sentiment. This provides a
other similar networks. comparison of how different subreddits are perceived by other com-
We focus on analyzing the SNAP dataset of Reddit [3], which con- munities.
tains 2 and half years of user generated data collected from Reddit. Our tool aims to enable the visual exploration of the different re-
Reddit is an open communication and link sharing platform, where lationships between multiple Reddit communities and see their inter-
users can share links to content located on the internet, including on actions. Additionally, we enable the comparison of multiple of these
different parts of Reddit itself, which are called subreddits. The dataset observed communities based on several pre-defined features to find
we use focuses on cataloging posts which link from Reddit back to any similarities between them.
Reddit itself and adds a common set of attributes to this data. This The main contributions of our paper are:
includes the overall sentiment of the source post, where users voted on
whether the given post is positive or negative. Furthermore, each post • We present how to visualized sentiment data for temporal di-
has a Linguistic Inquiry and Word Count (LIWC) score [6], which can rected graphs and how to use this for determining the relation-
be used to classify the post’s content. This dataset is temporal and ships between different communities.
represents a directed graph.
Our visualization tool focuses on showing the relationships between • We present a way to compare features of Reddit communities
different Reddit communities by analyzing their interactions. This is and how they evolve over time.

The remainder of this paper is structured as follows. Section 2


presents the related work to our paper. Section 3 describes how we
use our analytical visualization tool to analyze the relationships be-
Manuscript received 20 Mar. 2020; tween Reddit communities. Section 4 presents our results and which
insights were gained. Finally, we conclude the paper in Section 5.
2 R ELATED W ORK different subreddits. The search functionality is a way to center the
Kerracher et al. [2] discuss different approaches to visualizing tempo- subreddit of interest, based on its name or topic and brings it into the
ral graph data. They categorize these approaches based on how the central focus. Here the lines represent the existence of a relationship
time is encoded and how the graph structure is represented. The time between the two subreddits, which are represented as the graph nodes.
aspect can be made into a new dimension and then only certain time The thickness of the lines represents the relative volume of posts com-
slices are displayed. This is the approach we have taken. Another ap- pared to others currently displayed (i.e. not an absolute scale). This
proach is to consider time as a graph node itself, which the authors enables quick identification of the strongest connections between com-
call embedded time, as time becomes part of the graph representa- munities as indicated by the number of interactions. The color scale
tion. The most widely used graph representation is the node-link and of the line represents the average sentiment between the subreddits. If
matrix representation combined with sequential time view. The paper the overall post sentiment is negative, the colors approach red tones,
also briefly discusses how these approaches can be mixed to produce whereas more positive sentiments result in green tones. This com-
different visualizations. bination of colors enables the users to quickly determine the overall
Farrugia et al. [1] present a tool for visual exploration of temporal sentiment without having to click on any extra parts of the graph. The
graph data. They use two different views on temporal network data. A size of the node indicates the amount of subscribers, and the colors
matrix view is used for visualizing changes over time in the network of the nodes are used for visual distinction. Further information can
and node view (network view) to view static snapshots of the network be then accessed by selecting the subreddit node, going into the radar
at a given point in time. The matrix view represents the main screen, view described in the next paragraph.
from which nodes of interest can be selected for further analysis. The
matrix view also supports an animation over time of the changes in the
data. Our visualization approach is different, in that we start with the
network view showing an aggregation of data for a given time frame,
giving the user an option to further restrict this time interval up to daily
increments.
Park et al. [5] combine visualization and text mining to find pat-
terns in subreddits related to mental health. Their work focuses on
taking data from only 3 subreddits and grouping the main topics and
keywords from posts in these subreddit using natural language pro-
cessing. The information gained is then visualized using word clouds
(most common themes) and then these word clouds are connected with
each other to form a network view to show the relationships between
these words. A second view is a heatmap of the topics in different
subreddits. Our work is different in that we do not focus on topics
discussed in a subreddit and model the interactions between the sub-
reddits. Here the authors did not investigate the interactions between
the observed communities and only studied them in isolation and com-
pared their features.

3 V ISUALIZING C OMMUNITY R ELATIONSHIPS


Our visualization is composed of one central screen, which the dis-
plays all of the major information at a glance. The users can select the
time range that is relevant to their interest, or play section by section
an animation of how the flow changes over time. To further inspect a
community, the user can then click on one or more subreddits to view Fig. 3. Network view showing the relationships between different sub-
additional information about them. reddits

Timeline navigation To represent the temporal nature of our data,


we segment the data into discreet intervals with a lowest resolution of Radar view The radar view is a secondary screen that can be ac-
1 day. The bar chart represents an overall volume of posts per given cessed by selecting one or more subreddits from the network view,
time interval and at the same time serves as a navigation to select only as seen in Figure 4. This view serves to further explore attributes of
a specific time range for the overall visualization. Figure 2 shows the selected subreddits. From this, the user can view the relationship be-
overall design of the timeline navigation. This timeline is a central tween the relative ratio of incoming and outgoing posts, readability
piece of the visualization and is present in all views, as it provides the level and other attributes. The side by side comparison enables gain-
contextual information of which time period is being visualized. The ing insights into what are the differences between subreddits and find
timeline is also a way to control the flow of time animation, which any similarities in the patterns. The radar view provides the easiest vi-
enables the users to see the changes over time on the main network sual comparison of these different features and the colors differentiate
view. multiple selected subreddits.
Search View A search bar provides an additional functionality
of seeking for a specific subreddit and its interconnections. When a
user provides an input key, a network where the specific subreddit is
placed as the central node is returned. Moreover, the links that connect
it with every other subreddit that they mentioned or referred each other
are appeared.

4 R ESULTS
Fig. 2. Timeline showing the range of the data to be analyzed Our tool was tested within the team of the paper’s authors. The tool
can be used to find how communities interact and what are their dif-
ferences. For example, in Figure 1, we can see that community ’sub-
Network view The network view is the main visualization screen redditdrama’ interacts with a lot of other subreddits, mainly in one di-
displayed by Figure 3, where we display the relationships between the rection. This could be because ’subredditdrama’ mostly links to other
dits and directly compare them to gain further insight. Our paper made
the following contributions: we present a way to explore relationships
in temporal graph data and present ways this can be extended to other
data sets outside of Reddit.

R EFERENCES
[1] M. Farrugia and A. Quigley. Tgd: Visual data exploration of temporal
graph data. volume 7243, page 724309, 01 2009.
[2] N. Kerracher, J. Kennedy, and K. Chalmers. The design space of temporal
graph visualisation. In EuroVis (Short Papers), 2014.
[3] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large net-
work dataset collection. http://snap.stanford.edu/data/
soc-RedditHyperlinks.html, June 2014.
[4] E. Ortiz-Ospina. The rise of social media. https://
ourworldindata.org/rise-of-social-media. Accessed:
2020-03-15.
[5] A. Park, M. Conway, and A. T. Chen. Examining thematic similarity, dif-
ference, and membership in three online mental health communities from
reddit: a text mining and visualization approach. Computers in human
behavior, 78:98–112, 2018.
[6] J. W. Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry
and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates,
71(2001):2001, 2001.

Fig. 4. Radar view enabling the comparison of one or more subreddits


and their features

Fig. 5. The network of the typed subreddit leagueoflegends

subreddits. The color of the line also indicates that the sentiment is
on the negative side. This can be explained because many posts from
’subredditdrama’ links to other subreddits in time of drama. On further
comparison using the radar view, we can see that indeed the outgoing
volume is high and the sentiment is below average.
This visualization approach can be modified to suit other data which
is both temporal and graph based with multiple attributes. The time-
line navigation can support any temporal data and be the key for the
navigation and displaying changes over time. The network view can be
adapted to display different attributes between nodes using line thick-
ness and colors. Lastly, the popup radar view can display any ordinal
values. These values can be converted to a ratio for an easy compari-
son.

5 C ONCLUSION
We have created a tool which can be used to explore the different rela-
tionships within the Reddit communities over time. The users are able
to visualize the amount of total volume of posts overall for a given
time frame and then explore these relationships as they evolved over
time or view them all on an aggregated level. Furthermore, the user
can select to view additional information about the observed subred-

You might also like