Professional Documents
Culture Documents
Abstract— The basic issue in the online social network is to automatically control the unwanted messages posted in their
provide the ability for the user to manage the messages post on wall.
their wall. Online social networks offer only minimal assistance to This is the key of the online social network service which is
avoid unwanted content displayed in the user wall. To enhance not provided up to now. Certainly, online social network
the support, a system is designed to filter unwanted messages and provide minimal support to prevent unwanted messages from
allow user to have direct control on the messages posted in the
the wall. For instance, facebook offers three different methods
wall. It is achieved using flexible rule based system that allows the
user to specify filtering rule for their wall. And, Inference to filter the messages from the user wall.
algorithms are used to infer new information from the filtering The first method is to states who are the entire user allowed
rules to increase the efficiency of the filtering process. Machine to post messages in the wall. It may be friends, friends of
Learning based soft classifier is employed to facilitate the content friends or defined group of friends. They are only allowed to
based filtering. post message in the particular user wall. The second method is
Index Terms— Online social networks, information filtering, to block specific users from posting messages into the wall. It
short text classification just prevents the particular user to post messages in the user
wall. And the third method is to hide the undesired posts from
I. INTRODUCTION timeline. It merely conceals the posted message.
The advancement in computing and communication The key limitation of the existing methods is that it cannot
technologies enables people to get together and share filter the incoming posts based on the content. It just blocks
information in innovative ways. Social networking sites are the user who posts the messages in the wall. Certainly it is not
the familiar interactive medium to share a significant amount essential to the block the users in all situations. The existing
of individual information. In these networks, several types of methods do not support the content based filtering. So it is
content such as text, image, audio, video etc are exchanged impossible to prevent the undesired message without consider
every day. As per the facebook statistics, each month 90 the user who posts it. We believe that is the key online social
pieces of content is created by an average user while more networks service that has not been provided so far.
than 30 billion pieces of information are shared each month. To facilitate the content based filtering, this article
Typically, information in the social networks is introduces the filtered wall architecture. It will filter the
dynamically changing and users are overwhelmed with large incoming post based on the content.
amount of raw data. Content mining strategies are employed The remainder of the section is organized as follows.
to extract valuable information hidden within the enormous Section 2 surveys the related work whereas Section 3 provides
amount of data. These techniques provide effective support to the architecture and concepts of the proposed system whereas
complex tasks in the online social networks such as instance Section 4 illustrates the inference of new rule from the existing
access control or information filtering. rule. Finally Section 5 concludes the paper.
Information filtering is the process of providing appropriate
II. RELATED WORK
information to the people who need it. It significantly
searches for what actually concerns the textual document, The main goal of this paper is to design a system to provide
specifically web contents [1, 2, 3], and offers a user with customizable content based message filtering for online social
classification mechanism to avoid the unnecessary networks, based on machine learning techniques. Information
information. This information filtering process is used in the Filtering Systems are designed to categorize the information
online social network for insightful objective. which are generated dynamically and offer the information to
In the Online social networks, there is a chance to post or the user fulfill their requirement [6]. In the content – Based
comment on other’s public or private walls. Information Filtering system, each user is assumed to operate separately.
filtering techniques can be assisted to support the user to So the filtering system selects the information based on the
correlation between the content of the items and user However, this method is named as Prediction by Partial
preferences. Mapping, produces a language model that is used in
It is contradictory with the collaborative filtering system probabilistic text classifiers which are hard classifiers in
which selects information based on the correlation between nature and do not easily integrate soft, multimembership
the people with similar preferences [7], [8]. Initially the paradigms. In our scenario, gradual membership to classes a
information filtering process was used to categorize the email key feature for defining flexible policy-based personalization
messages, subsequent papers have refer the various domains strategies is need to be considered.
such as newswire articles, internet news articles, network
resources etc [9], [10], [11]. Content-based filtering mostly III. CONTENT BASED FILTERING
processes the textual document in nature and this builds To support the content based filtering in online social
content-based filtering close to text classification. network, Filtered wall architecture is introduced. In this
In fact, the activity of filtering can be modeled as a case of architecture, text mining techniques are employed to
single label, binary classification, and partitioning incoming categorize the incoming messages. Traditional text
documents into relevant and nonrelevant categories [12]. classification methods have major inadequacy in classifying
Other complex filtering systems comprise multilabel text the short text message. Short text message do not have
categorization that automatically labels the messages into sufficient word occurrences. An automated system called
partial thematic categories. filtered wall is designed in this paper to filter unwanted
Content-based filtering is mainly based on the use of the messages from user walls.
ML paradigm. In that, a classifier is automatically induced by In this system, Machine Learning based text categorization
learning from a set of preclassified examples. The feature [4] techniques are used to automatically allot each short text
extraction procedure maps text into a compact representation message with set of categories based on the content. Short
of its content, which is uniformly applied to training and Text Classifier is built to accurate extraction and set of
generalization phases. Bag-of-Words (BoW) approach yields discriminating feature in the message. Neural learning model
good performance and exists in general over more is employed for the efficient text classification. In particular,
sophisticated text representation that may have superior Radial Basis Function Network [5] acts as a soft classifier to
semantics but lower statistical quality [13], [14], [15]. handle noisy data and intrinsically unclear classes. Neural
There are varieties of key approaches in content-based model is enclosed within a hierarchical two level
filtering and text classification. Based on the application, each classification. In the first level, RBFN classifies the short
approach may having mutual advantages and disadvantages. messages as Neutral or Nonneutral and in the second level
In depth comparison analysis [4], has been conducted to verify Nonneutral messages are classified based on the
the superiority of classifiers such as Boosting-based classifiers appropriateness to each of the considered category.
[16], Neural Networks [17], [18], and Support Vector In addition the classification facilities, the system offer the
Machines [19] over other popular methods, such as Rocchio robust rule layer to specify Filtering Rules (FR) in a flexible
[20] and Naive Bayesian [21]. However, most of the work language. Using that, user can specify what content should not
related to text categorization by ML has been applied for long- be displayed on their walls. According to the user needs,
form text and the evaluated performance of the text different varieties of filtering rules are combined and
classification methods strictly depends on the nature of textual customized. The system also supports the user – defined
documents. Blacklists (BL) that is, list of users that are temporarily
Content-based filtering on messages posted on online social blocked to post messages on the user wall.
networks user walls poses additional challenges given the
short length of these messages other than the wide range of
topics. Probably, there are lot of difficulties in defining robust
features, essentially due to the fact that the description of the Filtered Wall
GUI
short text is concise and crisp, with many misspellings,
nonstandard terms, and noise. Zelikovitz and Hirsh [22] try to
improve the classification of short text streams by developing
a semi-supervised learning strategy based on a combination of Short Text
Content Based
labeled training data and a corpus of unlabeled related Classification
Filtering
documents.
This solution is inapplicable to the online social networks,
Filtering Rules
in which short messages are not summary or part of longer Social Network
semantically related documents. Another approach is proposed Blacklists Rules Manager
by Bobicev and Sokolova [23] that evade the problem of
error-prone feature construction by assuming a statistical
learning method that can work reasonably well without feature Fig. 1. Filtered Wall Architecture
engineering.
1417
2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]
The architecture to support of online social network B. Content-Based Messages Filtering (CBMF)
services comprises of three major components (Figure 1):
Social Network Manager, Short text classification and Content CBMF exploits the message categorization to enforce the
Based Filtering. Social Network Manager (SNM) offers the Filtering Rules specified by the user. First of all, in online
basic online social networks functionalities such as profile social network, the same message may have different
management, relationship management etc Short Text meanings and relevance based on who writes it. As a
Classification is employed to classify the incoming post consequence, FRs should allow users to state constraints on
messages. message creators. Creators on which a FR applies can be
Content Based filtering offers the support for selected on the basis of several different criteria; one of the
message filtration. Specifically, users interact with the system most relevant is by imposing conditions on their profile’s
via a GUI to set up and manage their FRs/ BLs. Moreover, the attributes. In such a way it is, for instance, possible to define
GUI provides users with a FW, that is, a wall where only rules applying only to young creators or to creators with a
messages that are authorized according to their FRs/BLs are given religious/political view.
published. As graphically depicted in Figure 1, the path Given the social network scenario, creators may also be
followed by a message, from its writing to the possible final identified by exploiting information on their social graph.
publication can be summarized as follows: Each Filtering rule comprises of three essential things: Creator
1. After entering the private wall of one of his/her specification, content based filtering and action of the system.
contacts, the user tries to post a message, which is intercepted Creator specification indicates the various criteria for creator
by the Filtered Wall. selection. Content based filtering specifies the content need to
2. A Machine Learning-based text classifier extracts be filtered based on the user preference. Action of the system
metadata from the content of the message. is to either notify or block the message.
3. Filtered Wall uses metadata provided by the
classifier, together with data extracted from the social graph FR= (author, creatorSpec, contentSpec, action)
and users’ profiles, to enforce the filtering and Black List
rules. In general, more than a filtering rule can apply to the same
4. Depending on the result of the previous step, the user. A message is therefore published only if it is not blocked
message will be published or filtered by Filtered Wall. by any of the filtering rules that apply to the message creator.
Note moreover, that it may happen that a user profile does not
The core components of the proposed system are the Short contain a value for the attribute(s) referred by a FR (e.g., the
Text Classifier modules and the Content-Based Messages profile does not specify a value for the attribute Hometown
Filtering (CBMF). whereas the FR blocks all the messages authored by users
coming from a specific city).
A. Short Text Classifier Black List rules can also be used to enhance the filtering
process. BL mechanism is to avoid messages from undesired
Short text classifier aims to classify the messages creators, independent from their contents. BLs is directly
according to a set of categories. For that, a classifier is build to managed by the system, which should be able to determine
extract and select the discriminating features of the short text who are the users to be inserted in the BL and decide when
message. Short text classification is comprised with two user’s retention in the BL is finished.
phases: Text representation and Machine Learning based To enhance flexibility, such information is given to the
classification. system through a set of rules, called BL rules. Wall owners
In the first phase, the short text message is specify who has to be banned from their walls and for how
represented in vector space model [24]. It will denote the text long. Banning considers two major measures: the number of
in an appropriate format to extract its discriminant feature. In times user is inserted in blacklist in certain amount of time and
this model, a short text message is represented as a vector of users whose message is continued to be failing in filtering
weights, Dj= w1j, w2j,……. wTj. The term frequency-inverse rules.
document frequency (tf-idf) weighting function is employed to Blacklist rule comprises of four essential things: author,
calculate the weight of each term in the message. creator specification, creator behavior and time. Author is the
While in the second phase, a neural network classifier user who specifies the rule. Creator specification indicates the
is employed to classify the incoming message. It automatically user who is allowed to post messages in wall. Creator
categorizes the short message into the suitable category, which behavior deals with the banning criteria. Time specifies the
are neutral or non neutral messages. Non neutral messages are amount of time the user is banned to post message.
further to analyzed to determine the appropriateness to each
category. BL = (author, creatorSpec, creatorBehavior, T)
1418
2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]
1419
2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]
1420