You are on page 1of 6

Social Media Analysis Web Application

Becir Isakovic
Dino Keco
Nejdet Dogru
Department of Information Technologies
International Burch University

Abstract—Social media is very important factor in analyzing In the article [3], Tripathy, Agrawal, and Rath explained
modern society as a whole, their values, norms, and behaviors, as how supervised machine learning methods can be used for
being a part of our everyday life. This study is oriented towards sentiment analysis. Authors have used Naive Bayes and
analyzing social media in order to allow users to create their own Support Vector Machine algorithms in order to determine
preferences to follow (analyze) a specific social media source. The sentiment in movie reviews. After comparing the accuracy of
web application has been developed to allow a user to follow these algorithms, they found out that Support Vector Machine
specific Facebook accounts and categorize the Facebook posts on perform better than other used algorithms.
those accounts based on the user defined taxonomies. Results of
this study are various reports generated from the Facebook posts Another interesting approach toward analyzing social
and their statistics that are clustered based on the user defined media was demonstrated by Alp and Oducucu in their paper
taxonomies. The benefit of this project is that any user can track [4]. In this paper, the effort was made in order to extract topical
in real time when people are talking about some topic, and it information from tweets by using hashtags. They tried and
enables anyone to have better insight about society as a whole, compared different methods and best performing was Latent
their values, norms, what they find interesting, and many other Dirichlet Allocation (LDA) that merges several tweets in order
things. This tool is also useful for different companies to track the to try to improve performance. The problem with their
user feedback on social networks for their products. approach is that it is highly computationally intensive and that
it performs poorly on short tweets.
Keywords—social media analysis; facebook; big data; noSQL
database; parallel programming Chung-Hong Lee conducted an interesting study related to
evaluating relatedness of disastrous stories and events in his
I. INTRODUCTION paper [5]. The author used social-media messages in order to
describe real world events through relatedness analysis by
Social media entered every segment of our lives, from mining content of these messages and then compare these
private and personal to business and professional, and findings with results of relatedness analysis on Twitter
companies conduct many of their business activities using their microblogs. This online unsupervised method has been proved
social media pages. The correct analysis of the social media to be quick and accurate for near real-time event identification.
data tends to be very important in order to identify customers’
behaviors and needs for companies, and as well as to Fanpage Karma [6] is a platform that allows businesses to
understand needs of society for governments and social analyze their and competitors accounts across several social
institutions. Analysis of the vast amount of data (big data) media platforms (Facebook, Google+, Pinterest, YouTube,
collected from social networks allows companies to gather Twitter, and Instagram). It gives you a number of reports for
useful insights about their products, services, and customers accounts that you want to analyze and compare their respective
[1]. The analysis of such data can be used in order to measure performances.
client satisfactions, marketing and promotion success, customer
LikeAlyzer [7] give users ability to perform check of any
perception of brand and products, and, moreover, it can
Facebook page. Good think about it is that it doesn’t require
enhance many segments of business from marketing up to
access to Facebook Insights. It has easy to use interface and it
sales. Despite all benefits that come from social media
gives user bunch of different reports about page that they have
analytics, many businesses fail to recognize full potential and
analyzed.
power of the social media. Similarly, the governments fail to
recognize the importance of the social media and effects that it Klear [8] is platform for both, influencer-identification and
can have on the society. analysis. It enables users to search for influencers in different
locations and categories (celebrities, power users...). It allows
Following article [2] underlies the importance of social
you to get top content of some account on several social media
media in everyday life thus the importance of proper analysis
platforms (Facebook, Instagram, and Twitter).
of these ‘communities’. It also proposes several machine
learning algorithms (matrix factorization, neural networks) and Unlike all other platforms for social media analysis, our
techniques (group recommendation by using trust neighbors, a application gives anybody ability to analyze anything (through
tag-based algorithm for the recommendation) that can be used different categories) on Facebook. It gives you insights in what
to address this problem. are values and norms of society and how some aspect of life
(e.g. politics) can influence other aspect (e.g. vulgar speech).

 
   
Focus of this study is to demonstrate the benefits of charts that will track how much portal/account followers are
analyzing social network data using big data tools. It introduces speaking about chosen category in defined period. Moreover,
a particular implementation of social media analysis that aims the user will be able to filter events of interest using date range
to provide the ability for anybody to impose their own criteria filter.
and data sources that they want to analyze. In this manner, it
enables every user to see how some particular event in some B. System Architecture
location reflects on people’s behavior and opinion. For purpose The SMA WEB APP [9, 10] is the Java Spring based web
of convenience, we conclude this paper with one particular application used for real-time analyzes of the data on the
case that shows how society reacts to an event in their Facebook social network. The main feature that SMA uses for
environment. analyzing is taxonomy. The taxonomy consists of set of
The remainder of this paper is organized as follows. In different categories and each category has keywords which are
Section 2, nature of data and how it is collected are explained bound to it. For every keyword, there is a list of synonyms that
in addition to details of the system architecture which was represents the same word (Synonyms may not have the same
developed to analyze the collected data. Section 3 demonstrates meaning with keyword. They are used to make categories more
the output of the developed system while Section 4 discusses distinguishable). The taxonomy in the SMA WEB APP is
performance and usefulness of the proposed system. The paper defined per user so each user on the system has its own
is concluded in Section 5 by highlighting the importance of the taxonomy allowing the user to have personalized analysis form
research and future work. himself/herself. Fig. 1 shows the system architecture for SMA
WEB APP.
II. METHODS AND MATERIALS SMA WEB APP is used for interaction with the system for
creating and managing accounts and its settings. As part of the
A. Web Application Functionality settings, the user can configure the sources that it wants to
This information system is a website that presents statistical follow on the Facebook social network. Sources are URLs of
information to end user about user defined categories in social the Facebook pages. The user can also modify taxonomy per its
networks. The system is designed to provide better insight to needs. Moreover, the user has access to various reports for the
how much and when people are speaking about predefined social media data that is processed by Categorization Engine.
topics such as politics, sports, culture. By doing so, valuable Reports like weekday punch card, keyword frequency analysis,
reports can be created by exploiting provided category time based heatmaps, category distribution, category timeline,
information in order to analyze opinions of citizens for posts timeline, individual post analysis and others are available
different purposes. More specifically, this system will retrieve in the SMA WEB APP. For data persistence, Mongo database
data from Facebook site of popular target portals in the region [11] is used in replicated and shared configuration. The SMA
or personal accounts, and then group them into categories. The WEB APP works in the multi-process and the multi-threaded
user will be able to choose which category he/she is interested environment in order to speed up data gathering.
in and he will be served with different kind of graphs and

Fig. 1 Architecture of SMA WEB APP system


The scheduling of processes and threads is done by Spring
Quartz library. Each thread runs two task; crawling and
categorization. The crawler uses Facebook API [12,13] to fetch
data from user defined sources. The response data is encoded
into JSON (JavaScript Object Notation) data format. After the
crawler fetches the data from API, it sends it to the
Categorization Engine for categorization. The categorization
process tokenizes Facebook post and its comments and based
on the tokens it searches for categories related to this post.
Categories are fetched from user defined taxonomies. When
categorization is completed, the result is the Facebook post or
comment with the user defined categories and keywords
appended to it. Also, the data about the Facebook user that
created post or comment is kept in SMA WEB APP. Such
categorized post is saved into Mongo database collection called
POSTS.

C. Categorization Engine
Our categorization engine allows every user to impose his
own constraints according to which data will be grouped and
analyzed. Every user can define the number of different
categories. These categories can be some general topics such as Fig. 2 Category creation process
politics, sport, culture or more specific topics such as winter comments. The most frequent words in this dataset are used as
tires, glasses, ventilators. Further, every category can have categories for further analysis.
many keywords. These keywords tend to be more specific
concepts that help categorization engine to make a more Category list for our system is created using these the most
accurate grouping. And finally, every keyword has its frequently used words in order to analyses social media
synonyms. These synonyms do not have to be what we activities in Bosnia and Herzegovina. Using our category list,
perceive to be the synonym in language, rather they are there we have analyzed how frequently these categories are
just to have more accurate categorization. So, once this engine mentioned on those sites as well as which days and what time
starts categorizing posts, we will know in which category of the day.
particular post is, but moreover, we will know which keyword Fig. 3 depicts categories which were determined through
or synonym made that post to enter into that specific category. offline analysis and their presence on Facebook pages of 4
Fig. 2 shows how categories are created. As seen from the portals.
Fig.2, “politics” category was created using two keywords
(Hillary Clinton and Donald Trump) and each keyword has one It can be seen from the chart that social interactions
synonym (Democratic Party and Republican Party, comprise 39% of social activities while surprisingly vulgar
respectively). More keywords and more synonym for each speech comprises 24% of posts and comments on target
keyword can be used while creating categories. Facebook pages. Indeed, these findings are quite interesting
and valuable when it comes to analyzing society as a whole.
III. RESULTS TABLE I shows the most frequent words in collected data.
This section demonstrates the usefulness of the developed Since Facebook pages were in Bosnian language, naturally, the
application by demonstrating two use cases. First use case most frequent words were Bosnian. English translations of
analysis Facebook pages of 3 most popular news portals these words are written in the parenthesis under each word in
(klix.ba, avaz.ba, sportsport.ba, ekskluziva.ba) in order to the TABLE I. TABLE I can give us an idea about what are the
identify the most discussed topics in Bosnia and Herzegovina. topics which are discussed in those Facebook pages we
monitor.
It is assumed that the most discussed topics in these
Facebook pages will give us an idea about what are people’s In Fig. 4 and Fig. 5, a number of posts or comments on
topic of interest. Second use case analyses the people’s reaction social media according to the day of the week and time of the
to the court decision on March 31, 2016 about Vojislav Šešelj day are presented, respectively.
who is famous politician and believed as a war criminal. almost same in each day except the number of posts and
570 MB of data from most popular portal Facebook pages comments on Tuesdays to Fridays is slightly higher than
in Bosnia and Herzegovina (klix.ba, avaz.ba, sportsport.ba, Saturday, Sunday, and Monday. One reason could be that
ekskluziva.ba) was retrieved. This data included 214.037 feeds people might be spending time with their families and friends
from 83.039 different people between 20th of March and 12th during the weekend and focusing on work on Monday so that
of April 2016, where 17.868 were posts and 196.169 were they do not have enough time for following or writing on social
media.
Fig. 3 Presence of categories in social media

TABLE I. THE MOST FREQUENT USED WORDS.

Keyword Occurance Keyword Occurance


1 Istina 3148 11 Ubica 1360
(Truths) (Murderer)
2 Sramota 2435 12 Policija 1220
(Shame) (Police) Fig. As seensocial
4. Daily in Fig. 4, activities
media a number of posts and comments are
3 J*** 1885 13 Sram 1191 released from all charges related to war in Bosnia, by
(Cursing) (Shame) International Court of Justice. Fig. 6 presents user activity
based on our categories with respect to his verdict.
4 Zena 1796 14 Bogu 1133
(Woman) (God) It is seen from Fig. 6 that user activity on March 31 was
clearly higher than other days. This activity is detected because
5 Facebook 1795 15 Strasno 1108 usage of one word that is Seselj. In light of this event, we have
(Scary) examined one more category in order to show the relation
between categories, that is Vulgar Speech category. Results are
6 J**** 1759 16 Budala 1099 depicted in Fig. 7.
(Cursing) (Fool)
Fig. 7 shows that Seselj verdict was accompanied with the
7 Majka 1753 17 Boga 1089 highly vulgar speech on social media, not just that day, but day
(Mother) (God) before and several days after this verdict.
8 Pare 1649 18 Zene 1079 In addition to the statistics about posts and comments, the
(Money) (Women) further key search can be done among already categorized and
9 G**** 1551 19 Allah 954 retrieved posts and comments. Fig. 8 shows search results for
(Cursing) (God) “akciza” in user predefined category “Politika”. One of the
user’s category is Politika and the user wants to see posts or
10 Mater 1404 comments that have the search keyword “akciza”. As it is seen
(Mother) from Fig. 8, posts or comments from different Facebook pages
are listed. The user is able to click any of those posts and is
From Fig. 5, we can conclude that social media users are redirected to the Facebook page where the post/comment is
more active afternoon and the most active around 20:00. Since retrieved. This will allow the user to see in what context that
we have analyzed very broad categories, our analysis shows word is used.
only when users are the most active. However, such analysis
could be very useful for many organization to target their
campaign on their target customers when the target group is the IV. DISCUSSION
most active. This paper introduces the SMA web application which
analyses posts and comments on predefined Facebook pages
Moreover, benefits of our the SMA WEB APP is
according to categories which are created by each user. Results
demonstrated on a concrete event. On March 31, famous
have shown that reports from the SMA WEB APP can be used
Serbian politician and war criminal Vojislav Šešelj was
to see interests of a nation or a small group and their activity V. CONCLUSION AND FUTURE WORK
patterns. If categories are defined well, and target group and Social media data analysis has taken huge attention by
associated Facebook pages are selected carefully, very valuable researchers, business owner, and social media users in order to
information can be retrieved from reports. become more aware of behavior or thoughts of other social
This application will enable any ordinary user to analyze media users. This paper introduces a software which is
social media activities. It will help individuals as well as developed to analyze feeds of certain Facebook accounts to
organizations to follow certain events and be aware of others’ collect information about topics which is predefined by the
opinion. user. A software which is developed in Java was used to
analyze the data which is collected from the Facebook pages of
SMA WEB APP downloads all related posts and comments the most visited portals in order to demonstrate functions of
for each user separately and presents them to the user in a way application. Results have shown that developed program is able
that activities in the social network can be recognized. to create report about user defined topics. This program will
The amount of stored data will increase proportionally with enable users to easily notice important event which is
the increase in a number of users. In addition, new posts and happening around them, and which they may want to be aware
comments will be downloaded regularly in order to keep the of. This software could be improved by using machine learning
statistics up to date. This increase at data size will bring issue techniques to make better categories and stemming technique
with search speed and accordingly user experience. Advances can be added to resolve problems of variations of same words.
on database side will be needed to improve search speed to be
able to serve thousands of users.

Fig. 5 Social media activity distribution during a day.

Fig. 6 User activities on March 31.


Fig. 7 Amount of posts and comments under Vulgar Speech category

Fig. 8 Feeds search results.

[7] Meltwater, “Analyze your Facebook page - LikeAlyzer,” LikeAlyzer.


REFERENCES [Online]. Available: http://www.likealyzer.com/. [Accessed: 07-Sep-
2017].
[1] A. D. Noyes, “Top 20 Facebook Statistics - Updated July 2016,”
Zephoria Inc., 29-Sep-2016. [Online]. Available: [8] “Klear - Influencer Marketing Platform,” Klear is a social intelligence
https://zephoria.com/top-15-valuable-facebook-statistics/. [Accessed: platform that helps brands do smarter marketing. [Online]. Available:
11-Oct-2016]. https://klear.com/. [Accessed: 07-Sep-2017].
[2] C. L. Philip Chen, D. Tao, and X. You, “Big learning in social media [9] ibecir, “ibecir/socialmediaanalysis,” GitHub. [Online]. Available:
analytics,” Neurocomputing, vol. 204, pp. 1–2, Sep. 2016. https://github.com/ibecir/socialmediaanalysis. [Accessed: 08-Sep-
2017].
[3] A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of
Sentimental Reviews Using Machine Learning Techniques,” Procedia [10] “SMA WEB APP,” Social Media Analysis Web Application.
Comput. Sci., vol. 57, pp. 821–829, 2015. [Online]. Available: http://sma.ibu.edu.ba:8080/. [Accessed: 13-Sep-
2017].
[4] Z. Z. Alp and S. G. Oduducu, “Extracting Topical Information of
Tweets Using Hashtags,” in 2015 IEEE 14th International Conference [11] “MongoDB Documentation.” [Online]. Available:
on Machine Learning and Applications (ICMLA), Miami, FL, 2015, https://docs.mongodb.com/. [Accessed: 07-Sep-2017].
pp. 644–648. [12] “Facebook for Developers,” Facebook for Developers. [Online].
[5] C.-H. Lee, “Unsupervised and supervised learning to evaluate event Available: https://developers.facebook.com/. [Accessed: 07-Sep-
relatedness based on content mining from social-media streams,” 2017].
Expert Syst. Appl., vol. 39, no. 18, pp. 13338–13356, Dec. 2012. [13] “restfb,” restfb. [Online]. Available:
[6] U. GmbH, “Analyze and improve fan pages - Fanpage Karma.” http://restfb.com/documentation/. [Accessed: 07-Sep-2017].
[Online]. Available: http://www.fanpagekarma.com/. [Accessed: 07-
Sep-2017].

You might also like