Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Thought Bubbles: a conceptual prototype for a Twitter based recommender system for research 2.0

Thought Bubbles: a conceptual prototype for a Twitter based recommender system for research 2.0

Ratings: (0)|Views: 1,033|Likes:
Published by Martin
Draft Version of i-Know 2012 publication
Draft Version of i-Know 2012 publication

More info:

Published by: Martin on Sep 08, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/08/2012

pdf

text

original

 
THOUGHT BUBBLES
 A Conceptual Prototype for a Twitter based Recommender System for Research 2.0
Patrick Thonhauser
1
, Selver Softic
1
, Martin Ebner
1
1
 Department for Social Learning, Institute for Information Systems and Computer Media, Graz University of Technology, Austria patrick.thonhauser@gmail.com, softic.s@gmail.com, martin.ebner@tugraz.at 
Keywords:Recommender System, Twitter, Thought Bubble, Classification, Social, Data MiningAbstract:The concept of so called
Thought Bubbles
deals with the problem of finding appropriate new connectionswithin Social Networks, especially Twitter. As a side effect of exploring new users, Tweets are classified andrated and are used for generating a kind of news feed, which will extend the personal Twitter feed. Each userhas several interests that can be classified by evaluating his Tweets in first place and secondly by evaluatinguser related and already existing contacts. By categorizing a user and concerned connections, one can beplaced in an imaginary category specific subset of users, called
Thought Bubbles
. Following the trace of people who are also active within the same specific
Thought Bubble
, should reveal interesting and helpfulconnections between similar minded users.
1 INTRODUCTION
Twitter has grown tremendously in the last fewyears and is generating 200 million Tweets and 1.6million search queries each day. As of now (2012),Twitter has over 250 million users
1
. These are prettyimpressive numbers for a micro blogging/social-network platform and Twitter has already become acultural phenomenon. Every day people all over theworld are communicating via Twitter, exchanging thelatest news and discussing millions of diverse topics.The list of tweetable actions is almost infinite and ev-erybodywhoisinterestedinaspecificpersonoraspe-cific topic, has the ability to consume the knowledgeby reading certain tweets or exploring the tweeted re-sources.However, the interesting questions for researchersare how to make use of the information containedwithin millions of tweets and what to extract fromthose 140 character micro blogs. How much usefulinformation is in a Tweet and how can we separatefeasible information from noise? This paper presentsa novel concept for finding new interesting users andinformation for a specific Twitter account. Many re-searchers already solved parts of this puzzle and sev-eral parts of these concepts are based on findings of (Softic et al., 2010), (Mika and Laniado, 2010) and
1
http://thesocialskinny.com/100-social-media-statistics-for-2012/ (April 2012)
(Choudhury and Breslin, 2010). To Semantic Webresearchers, Twitter has become one of the most pop-ular applications for the dissemination of information(Kraker et al., 2010) and it is therefore a legit candi-date to serve as the main source for mining data con-cerning users and provided information of scientificinterest.This paper doesn’t serve as a detailed descrip-tion of a forthcoming semantic recommender systemfor research 2.0, but rather as a brief overview of aproof of concept application, which’s main task is theclassification and recommendation of Twitter users.Also preliminary results of this extensive categoriza-tion task are presented in this paper.
2 CONCEPT
Twitter users follow other users for specific rea-sons. In the majority of cases these reasons are con-cerned with similar fields of interest. Nonetheless,this doesn’t mean the connection between similar in-terested Twitter users is bidirectional. When socialnetwork connections aren’t bidirectional, an individ-ual user doesn’t implicitly have to know his follow-ers. Obviously, the follower is interested and involvedwith similar topics, as the person he or she follows.Therefore, there is a big probability that friends andother colleagues of the followed user have similar
Draft version, originally published in: Patrick Thonhauser, Selver Softic, and Martin Ebner. 2012.Thought Bubbles: a conceptual prototype for a Twitter based recommender system for research 2.0. InProceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies(i-KNOW '12). ACM, New York, NY, USA, , Article 32 , 4 pages. DOI=10.1145/2362456.2362496
 
connections, whichcanbeofcertaininterestforaspe-cific user.A user is active in several kinds of topic basedbubbles, where the participating users do not neces-sarily know all participants of such a bubble. How-ever, in most cases, one doesn’t have just one specialkind of interest and he or she is part of several topicbased subsets of users. Hence, users within one user’sspecific bubble, might be of interest for each other.Figure 1 shows an example of a so called network graph,which reveals the sphere of activity within di-verse
Thought Bubbles
. Users marked with a star (*)are potentially of big interest for this account (bluehighlighted in figure 1). These users belong to thesame topic specific bubble, as in here, to the
Science Bubble
. However, also the connection between theyellow marked account and the accounts marked witha star, isn’t bidirectional.
.**.
Science Bubble
 
Music BubbleDeveloper Bubble
Figure 1: This is an example of how a user can be placed ina Twitter network graph.
This implies that following a specific user of a cer-tain field opens a big probability of finding furtherrelevant users who are also acting in a field of specificrelevance. The missing bidirectionality of certain userconnections, hints at interest only relationships.Being conscious of this, led to the concept of 
Thought Bubbles
. This holds the possibility to rec-ommend people and information, which is containedwithin a bubble and wasn’t explored by a specificTwitter user so far.
3 SYSTEM MODULES
The conceptiual realization of 
Thought Bubbles
can be split into several sub modules.
3.1 Finding potentially interesting users
The first sub module deals with the problem of sepa-rating users that merely produce noise or spam, fromthose that spread news, personal thoughts and facts.To simplify this process we have to define the poolof people who are connected to ones Twitter account.This connection exists because one is following otherusersorbecauseotherusersarefollowingoneself. Wecallthispoolofpeoplethe
innercircle
. Separatingtheinner circle of people by filtering useful informationprovided by those people helps to reveal further ac-counts of potential interest, which are hidden in theso called
outer circle
. However, the outer circle of people represents the connection to every person act-ing within ones inner circle. Subsequently, a secondcycle of filtering is performed to efficiently narrowdown and identify the people of potential interest.(Horn, 2010) uses
Support Vector Ma-chines
(SVMs) for this quite rough classificationtask. SVMs are a commonly used technique fortext classification and are recommended by manyresearchers like (Rios and Zha, 2004), (Hsu et al.,2010) or (Nakagawa et al., 2001). By applying thismethod, a potentially interesting set of users wouldremain for further consideration. Also the usageof a POS-tagger and a chunker in advance, couldhelp to acsertain if a Twitter account belongs to aperson. Eliminating duplicates within this set andeliminating the accounts that one already follows,one usually leads to a quite clearly arranged set of Twitter accounts that is worth exploring in depth.
3.2 Categorization of users
Granular categorization of users is the most complextask within this system. In first place it’s necessary tocategorize the active user who uses the Thought Bub-ble service. In the very beginning, a set of appropri-ate categories that covers all possible interests a usercould have, has to be defined. For example such cate-gories include developing, science, teaching, etc...To be able to classify a user, it’s necessary to pro-cess ones Tweet history. The first step is to annotatewords in a users Tweets, which can be performed byapplying Natural Language Processing (NLP) (Ritteret al., 2011) techniques. Classifying Tweets is a veryspecial task regarding usual classification of text arte-facts. Thereasonsare: (a)theshortageofTweets(140character strings), (b) the often changing context inwhich a word is used and (c) the above average oc-currence of out of vocabulary words. By tagging allwords in Tweets (Part-Of-Speech tagging), the elimi-nation of unimportant words like copulas or preposi-
 
tions can be realized. (Gimple et al., 2011) for exam-ple, alreadydevelopedaPOS-taggerespeciallyfortheneeds of Twitter. Summarizing the results of all cate-gorized user Tweets, leads to a percentaged classifica-tion of a user. There are several techniques availableand approved for realizing this classification task. Re-ferred to section 3.1 SVMs can be used for such a task as applied by (Nakagawa et al., 2001). But there areseveral other ways for accomplishing this classifica-tion behavior like using Bayesian approaches (Gold-water and Griffiths, 2005). Future testing and evalua-tion will give clarity about the best way for realizingcategorization of Twitter users. One possibility forevaluation is presented by (Chen et al., 2009).
3.3 Additional Ratios
In addition to measuring the similarity of 
Thought  Bubble
attributes, regarding the affiliation of a userinto a category, several other ratios for determiningthe importance of a users recommendation are usedto sharpen the prediction accuracy. The following ra-tios are legit candidates for additionally influencingwether a user within a topic related bubble will berecommended or not.
Tweet Frequency
is the amount of Tweets a Twit-ter user is firing within a defined period of time.
The
Follower ratio
. The more followers a userhas, the more influence or credibility one mightposses. On the other side, if a user has very fewfollowers, but is following a huge amount of otherusers, might hint to a
Blast Follower 
2
.
The amount of retweets a users Tweets have, indi-cates the amplitude a users reputation has.
If an observed user isn’t connected with the innercircle bidirectionally, this denotes a non friend-ship but a sheer interest related relationship.
Clients will have the possibility to rate recom-mended users or Tweets as ”interesting” or ”notinteresting” for a specified category. By com-paring users, which are rated as interesting withpotential recommendations for a
Thought Bubble
,similarity between those, can also influence theusers overall rating score within a bubble.These ratios could help to sharpen the selectionof recommended Tweets and Twitter users. However,the main task regarding applying these ratios, is tofind an appropriate weighting scheme for every ratio.
2
http://www.makeuseof.com/dir/blastfollow-mass-follow-twitter-users/ (April 2012)
3.4 Recommendation
Recommendation decisions are made by calculatingratings for each potentially interesting user, based ontheir category classification and the additional ratios,mentioned in section 3.2 and section 3.3. Subse-quently, category classification of an active serviceuser, is compared to the classified categories of po-tentially interesting other users. In advance, all addi-tional ratios have different weights, which will finallyinfluence the position of a user in the final recommen-dation list. Definite values for those ratios have to befound during development and test runs of the systemand therefore, can’t be predicted previously.
4 DEMO APPLICATION
The
Thought Bubble Server 
will be implementedin Python and runs on an
Apache 2
web server. Figure2 visualizes the potential infrastructure of this system.
Twitter API
REST API
SQLiteDatabase
ClassificationWorkerThreadsTweetCollectorClients (iOS,Web, etc)RaterExternalInternalDatabaseOperationsThreadDatabaseWrapper
Figure 2: Thought Bubble infrastructure.
Twitter related API calls, which affect or are sig-nificant for the classification and recommendationtask, are processed and cached by the
Thought Bubble
server. The REST API acts as junction between theTwitter REST API and the client. All requests whicharen’t affecting the functionality of the Thought Bub-ble system, are directly processed by the TwitterREST API. When the system has completed catego-rizing and rating of potential recommendationsfor thefirst time a user starts to use this service, the systemstarts to enrich the Twitter stream with Tweets fromrecommended persons. Recommendation of singleTweets is based on the influence a Tweet has had dur-ing classification of a certain user. Thought Bubbleclients can be used just like usual Twitter clients forreading ones personal Twitter stream, tweeting or di-rect messaging. However, the big difference is that

Activity (3)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->