Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
7Activity
0 of .
Results for:
No results containing your search query
P. 1
A Framework Concept for Profiling Researchers on Twitter using the Web of Data

A Framework Concept for Profiling Researchers on Twitter using the Web of Data

Ratings: (0)|Views: 948 |Likes:
Published by Martin
Publication for International Conference on Web Information Systems and Technologies (WEBIST) 2013, Aachen, Germany
Publication for International Conference on Web Information Systems and Technologies (WEBIST) 2013, Aachen, Germany

More info:

Categories:Types, Research
Published by: Martin on May 14, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/30/2014

pdf

text

original

 
A Framework Concept for Profiling Researchers on Twitter using theWeb of Data
Selver Softic
1
, Laurens De Vocht
2
1
 Department for Social Learning, Graz University of Technology, Graz, Austria
2
 Multimedia Lab, Gent University - iMinds, Gent, Belgiumselver.softic@tugraz.at, laurens.devocht@ugent.be
Keywords:Research 2.0, Science 2.0, Web 2.0, Semantic Web, Social Media, Linked Data, Profiling, Twitter, Microblogs,Web Mining.Abstract:Based upon findings and results from our recent research (De Vocht et al., 2011) we propose a generic frame-work concept for researcher profiling with appliance to the areas of ”Science 2.0” and ”Research 2.0”. In-tensive growth of users in social networks, such as Twitter
a
generated a vast amount of information. It hasbeen shown in many previous works that social networks users produce valuable content for profiling andrecommendations (Reinhardt et al., 2009; Java et al., 2007; De Vocht et al., 2011). Our research focuses onidentifying and locating experts for specific research area or topic. In our approach we apply semantic tech-nologies like (RDF
b
, SPARQL
c
), common vocabularies (SIOC
, FOAF
e
, MOAT
 f 
, Tag Ontology
g
) and LinkedData
h
(GeoNames
i
, COLINDA
 j
) (Berners-Lee, 2006; Bizer et al., 2012) .
a
http://www.twitter.com
b
http://www.w3.org/TR/rdf-concepts/ 
c
http://www.w3.org/TR/rdf-sparql-query/ 
http://rdfs.org/sioc/spec/ 
e
http://www.foaf-project.org/docs/specs
 f 
http://moat-project.org/ontology
g
http://www.holygoat.co.uk/projects/tags/ 
h
http://linkeddata.org/ 
i
http://www.geonames.org/ 
 j
http://datahub.io/dataset/colinda
1 INTRODUCTION
Emergence of Social Web evolved many web con-tent producing communities. However informationgenerated in them still resides in isolated ”data si-los”. The main reason for this is lack of standard-ized approaches for data interlinking. The SemanticWeb Technology has well defined stack where appli-ance of common semantic vocabularies to model datasuch as SIOC (Semantically Interlinked Online Com-munities) (Breslin et al., 2005) and FOAF (Friend-Of-A-Friend) leads to generation of interlinked and se-mantically rich knowledge tanks (Bojars et al., 2008).This knowledge is built upon user profiles and thecontent they produce. Structuring e.g. microblog in-formation offers potentials on qualitative mining of such data.Methodology proposed in this paper relies on threemain steps: The first step is called ”triplification” or”RDFization” where data is extracted and annotatedusing vocabularies SIOC, FOAF, MOAT and Tag On-tology. The RDF triples as result of this process arestored and made accessible as linked graph instances.Thefinalstepincludesthepublicationofthedatavari-ous formats via SPARQL endpoint in order to providea data for mining, which is state of the art practicein Semantic Web domain (Bizer et al., 2012; Tum-marello et al., 2007; De Vocht et al., 2011). Herebyvocabularies modelling the domain context, used atthe same time for structuring and description enablemore profound insights on the nature of RDFizeddata.Twitter as most known microblog produces 190 mil-lion Tweets and 1.6 billion search queries each day
Draft version, originally published in: Softic, S., Ebner, M., De Vocht, L., Mannens, E., Vande Walle, R.: A Framework Concept for Profiling Researchers on Twitter using the Web of Data.Proceedings of the 9th International Conference on Web Information Systems and Technologies(WEBIST) 2013, SciTePress 2013, Karl-Heinz Krempels, Alexander Stocker (Eds.), pp.447-452,ISBN 978-989-8565-54-9, Aachen, Germany, 8 - 10 May, 2013.
 
1
(2012). Further it is widely accepted in scientificcommunity for communication e.g. at conferencesor for discussion purposes (Boyd et al., 2010; Jansenet al., 2009; Zhao and Rosson, 2009; Ebner et al.,2011) what makes it reliable base for researcher pro-filing process. However Twitter API has some limita-tion which means that single user timeline includesonly last 250 tweets. In order to consider thoseresearchers who tend to tweet more often an alter-native which includes also previous tweets must beprovided.As possible solution to overcome the limitsof Twitter API a tool called Grabeeter
2
(M¨uhlburgeret al., 2010) has been implemented by Graz Univer-sity of Technology. This application serves preserva-tion of social data from Twitter. Grabeeter includes atthis certain moment about 1700 profiles mostly fromresearchers and students and contains curently around6 Million tweets.This paper describes the concept architecture for theresearcher profiling framework. It is aiming at gain-ing more knowledge and mining usable data out of the social context of microblogs for researcher profil-ing using findings from (Reinhardt et al., 2009; Javaet al., 2007; Letierce et al., 2010; Boyd et al., 2010;Honeycutt and Herring, 2009; De Vocht et al., 2011).
2 RELATED WORK
The importance of microblogs, in recent years, isgaining on importance significantly every day (Zhaoand Rosson, 2009). Most favored among them isTwitter, which induced a new culture of communica-tion (McFedries, 2007; Java et al., 2007). Restricted140 characters long Twitter messages are comparablewith a short message internet-based services. Java etal. (Java et al., 2007) defined four main user behav-iors why people are using Twitter: for daily chats,for conversation, for sharing information and forreporting news. Twitter is generating vast of tweetsand search queries each day. According to recentreports
3
(2012), Twitter has over 225 million users.Around 50 milliion of them use the Twitter each day.This makes the Twitter worth to be researched moreinto detail (Kwak et al., 2010). Usage of Twitter atconferences helps to increase information awarenessaround the event as well it supports spontaneousconversation between the conference participants,which can be used for networking and experience
1
http://thesocialskinny.com/100-social-media-statistics-for-2012/ 
2
http://grabeeter.tugraz.at
3
http://thesocialskinny.com/100-social-media-statistics-for-2012/ 
exchange. Nowadays very often so called conferencerelated Twitter-streams based upon a hashtag searchreflect the ongoing occurrences within the actualevent (Reinhardt et al., 2009). Twitter info-wallsplaced at the conference location also support theconference administration, communication anddiscussion between the scientific tracks and ses-sions(Boyd et al., 2010; Jansen et al., 2009; Zhao andRosson, 2009; Ebner et al., 2011). Applied in thismanner, microblogging becomes a valuable reportingand exchange service. This finding is also confirmedvarious different publications before (Reinhardt et al.,2009; Ebner et al., 2010).Communicational patterns in microblogs areeasily mappable into a tripartite structure (De Vochtet al., 2011; Mika, 2005). Tripartite relations to datacorresponds to the basic idea of RDF Framework andgraph based data relation. Regarding Twitter, recentlythere have been some efforts like Semantic Tweet
4
to bring the data about Twitter users into a wise se-mantic form. In current research efforts for mappingof relations between the users, widely used FOAF(FrienOf-A-Friend) vocabulary is recommended tobe used, and it will be considered by our architectureparadigm. For posts description and relations aroundmicroblogs like topic, author, content Semantic Webcommunity offers a vocabulary called SIOC (Se-mantically Interlinked Online Communities) (Bojarset al., 2008; Breslin et al., 2005) along with DublinCore
5
. Dealing with tags established MOAT(Passant,2008) and Tag Ontology as good ontologies in thisrealm (Softic et al., 2009). Currently there are alsosome scientific projects that adress the issue of semantic microblogging platforms. Most remarkableof them is named (Semantic MicrOBlogging) orrecently also known as SMOB2 (Passant et al., 2010;Passant et al., 2008). It provides a SPARQL APIand relies on vocabularies like FOAF, SIOC, MOATand OPO (Online Presence Ontology)
6
. Additionallyit offers interfaces to the semantic search engineslike Sindice
7
and to the Linked Data Cloud (LOD).Twitter based User Modeling Service (TUMS)
8
in-fers semantic user profiles from the tweet messages.This platform provides a topic detection and entityextraction for tweets. Further it allows an enrichmentby linking tweets to news articles related to thecontext of these tweets (Tao et al., 2011). Accordingto the emerging trends, there are some proven domain
4
http://semantictweet.com/ 
5
http://dublincore.org/documents/dcmi-terms/ 
6
http://online-presence.net/ontology.php
7
http://sindice.com/ 
8
http://wis.ewi.tudelft.nl/tums/ 
 
vocabularies as FOAF, SIOC, DC (Dublin Core)
9
,MOAT, Tag Ontology or semantic retrieval standardprotocols like SPARQL provided by the SemanticWeb Community, which can be used for semanticdescription and quering of semantically enrichedmicroblog data from Twitter (Mendes et al., 2010;Softic et al., 2010). For description of conferencedata through label, description, start and end date andlocation SWRC (Sure et al., 2005) Ontology alongwith the GeoNames Ontology covers all needs forCOLINDA as primary mining source.Linked Data movement (Berners-Lee, 2006)turned meanwhile the LOD Cloud (Linking OpenData Cloud) (Bojars et al., 2008; Bizer et al., 2012) asresult of it into a reliable data source of graph baseddataofferingdatasetslikee.g. GeoNamesTheGeoN-ames is a semantic version of a location service. Foridentifying conferences, linked data set called COL-INDA (COnference LInked DAta)
10
offers a appro-priate SPARQL endpoint
11
. COLINDA is linked toGeoNames and contains information about confer-ence name, label, description, location and time whenthis event happened. Similar data sets also exist aboutbooks, publications, science etc. Linking semanticsourcesusingsimpleprinciplesdescribedin(Berners-Lee, 2006; Bizer et al., 2012) turns the web into alarge database, not only available for human, but alsoto intelligent agents (Bojars et al., 2008). Bringingimplicit knowledge from Twitter data into this infras-tructure would enhace the LOD Cloud and offer solidinformation base for research on Research 2.0 andScience 2.0 issues.
3 ARCHITECTURE
3.1 Use Case and Design Specification
The main use case for framework is illustrated by:”the conference case” depicted in Figure 1 alreadypresented in our previous work (De Vocht et al.,2011). The idea of this scenario resides on factthat researchers are interested in very specific topicsand events and that most of them report about theseevents via blogs or tweets (Reinhardt et al., 2009;Ebner et al., 2010) what creates huge opportunitiesfor profiling. Agile development suggests to work with use cases. The framework has to support atleast the ”Researcher Profiling” application that
9
http://dublincore.org/documents/dces/ 
10
http://www.colinda.org
11
http://data.colinda.org/endpoint.php
ResearchersResearcher ProfilingResearcher(User)
User ModelEvent Model
ScientificConferencesResource
Profiler/ Analyzer
Figure 1: Use Case for ”Researcher Profiling” as proposedin (De Vocht et al., 2011).
meets the requirements to the use case presented infigure 1. According to the current research work about semantic extraction frameworks (Softic et al.,2010) for data mining the conceptual design shouldinclude three basic layers: a data extraction layer, aninterlinking layer and an analysis layer.
Extraction layer:
Extracts data from various severaldata sources and describes and relates them to aspecific data context using the ontologies.
Interlinking layer:
Is feeded with annotated data(triples) and creates a SPARQL endpoint for it. Itis responsible for requesting more data if neededfor a certain information query.Futher it interpresand handles high level queries and translates themfrom/to SPARQL.
Analysis layer:
Deliver the results from interlinkinglayer adding some metrics to rank and evaluate thereturned results.
3.2 Extraction
The extraction layer collects data from a profile fromTwitter and the Grabeeter (M¨uhlburger et al., 2010)and mapsthen into two models: the”User MicroblogsModel” and the ”User ProfileModel”. The ”User Mi-croblogModel”gathersall datafromthetweetsit getsfrom Grabeeter and describes them semantically us-ing SIOC and Dublin Core vocabularies. The ”UserProfile Model” is built upon Twitter user profile datawith FOAF ontology. If a user not exists in Grabeeter,then the user profile and microblogs will be retrieveddirectly from Twitter. The data from Twitter is re-trieved with the help of a Twitter API
12
. Finally, hash-tags are identified by simple regular expressions andlinked to the microblog data (text, creation date, au-thor) and user profile( author data, social connections)using the Tag Ontology. These models serve a com-ponent named ”Triplifier”, that creates semantic in-staces of graph by assembling the data using relevantentities from ontologies. The result of the extractionis a collection of forms semantically annotated datainto triples that describe the tweet wise content with
12
https://dev.twitter.com/docs/api

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->