Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
9Activity
0 of .
Results for:
No results containing your search query
P. 1
Keyword Extraction for Social Snippets

Keyword Extraction for Social Snippets

Ratings: (0)|Views: 937|Likes:
Published by Facebook
Today, a huge amount of text is being generated for social purposes on social networking services on the Web. Unlike traditional
documents, such text is usually extremely short and tends to be informal. Analysis of such text benefit many applications such as
advertising, search, and content filtering. In this work, we study
one traditional text mining task on such new form of text, that is
extraction of meaningful keywords. We propose several intuitive
yet useful features and experiment with various classification models. Evaluation is conducted on Facebook data. Performances of
various features and models are reported and compared
Today, a huge amount of text is being generated for social purposes on social networking services on the Web. Unlike traditional
documents, such text is usually extremely short and tends to be informal. Analysis of such text benefit many applications such as
advertising, search, and content filtering. In this work, we study
one traditional text mining task on such new form of text, that is
extraction of meaningful keywords. We propose several intuitive
yet useful features and experiment with various classification models. Evaluation is conducted on Facebook data. Performances of
various features and models are reported and compared

More info:

Published by: Facebook on Jun 09, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/25/2012

pdf

text

original

 
Keyword Extraction for Social Snippets
Zhenhui Li
UIUCzli28@uiuc.edu
Ding Zhou
Facebook Inc.dzhou@facebook.com
Yun-Fang Juan
Facebook Inc.yunfang@facebook.com
Jiawei Han
UIUChanj@cs.uiuc.edu
ABSTRACT
Today, a huge amount of text is being generated for social pur-poses on social networking services on the Web. Unlike traditionaldocuments, such text is usually extremely short and tends to be in-formal. Analysis of such text benefit many applications such asadvertising, search, and content filtering. In this work, we studyone traditional text mining task on such new form of text, that isextraction of meaningful keywords. We propose several intuitiveyet useful features and experiment with various classification mod-els. Evaluation is conducted on Facebook data. Performances of various features and models are reported and compared.
Categories and Subject Descriptors:
H.3.1 [Content AnalysisandIndexing]: Abstractingmethods, H.4.m[InformationSystems]:Miscellaneous.
General Terms:
Algorithms, Experimentation, Performance.
Keywords:
keyword extraction, social snippet, online advertising.
1. INTRODUCTION
Social networking services, such as Facebook, Twitter, and MSNMessenger, are developing at an amazing speed, leading to produc-tion of a special kind of user-generated text for social purposes.Usually, users generate such text to broadcast to friends or follow-ers their current status, recent news, or interesting events. We namesuch text
social snippets
. Status updates on Facebook and Twittermessages are representative examples of social snippets.Social snippets are valuable media to mine users’ recent interest.For example, if someone publishes his/her social snippet saying“is excited to receive my Wii today”. It is a sign that this per-son is interested in Wii games. One way to discover such interestis to extract meaningful keywords from social snippets. Extractedkeywords can be used for many applications. They can be treatedas personal social tags to better retrieve user-generated informa-tion. They can facilitate social recommendations. Most impor-tantly, they are essential for advertisement targeting. For example,advertisers can target the users who might be interested in videogames using the keyword “Wii”.In this work, we seek to develop the effective keywords extrac-tion methods for social snippets in the form of a classification task.There have been many related works on keyword extraction fromdocuments [4, 1] and web pages [3]. However, the special charac-teristics of social snippets make the keyword extraction a new and
The work was supported in part by NSF IIS-0905215 and theAFOSR MURI award FA9550-08-1-0265.
Copyright is held by the author/owner(s).
WWW 2010,
April 26–30, 2010, Raleigh, North Carolina, USA.ACM 978-1-60558-799-8/10/04.
even more challenging problem. It is statistically shown that socialsnippets are extremely short and tend to be more informal. Ac-cordingly, insights from related works on traditional data no longerhold true. In the following, we report a set of features to measurethe importance of keywords and compare the performances amongvarious classification models. We compare our approach with sev-eral other keyword extraction systems, such as KEA [1] and Yahoo!keyword extraction system. All the experiments are conducted ona sample of Facebook data.
2. KEYWORDEXTRACTIONALGORITHM
For each social snippet, we first tokenize it and generate uni-grams and bigrams to be considered as keyword candidates. Then,a set of features are calculated to represent each candidate. Wetrain a classification model based on the labeled keywords of socialsnippets. And finally the keyword candidates with highest scoresthrough classification model are returned.Here are the features:
TFIDF
: information retrieval feature.
TFIDF 
is a commonlyused feature in information retrieval. Intuitively, the word that ap-pears more often in a document but not very often in the corpus ismore likely to be a keyword. However, in our problem, this featurecould be less useful. Because in short social snippets, the TF formost keyword candidates is
1
. Frequency does not help to distin-guish real keywords from others.
lin
: linguistic feature.
In most cases, nouns have higher proba-bility to be keywords. For a word, feature
lin
is the number of itsnoun POS tags divided by the total number of its POS tags. Thisfeature is not affected by the “short” and “informal” properties of social snippets so we expect this feature to be an important one forclassification.
 pos
: relative position.
pos
is defined as the relative positionwithin a social snippet. For the bigrams, we consider the positionof the first word.
lenText
: length of social snippet.
lenText 
of a social snippet iscalculated by the number of words (including stopwords) in it. Forall the keyword candidates in one social snippet, they are assignedwith the same value.
 DF
:documentfrequency.
A hot and trendy topic is more likelyto be a keyword in social snippets. Feature
DF 
reflects the popu-larity of a keyword.
 capital 
: capitalization.
This feature has a binary value showingwhether there is a upper case letter in a keyword candidate.After applying the classification model on testing data, we canget a relevance score for each keyword candidate. Finally, we needto return the “top” candidates with highest score. The most com-mon way to select the “top” ones is to choose top-
k
candidates.However, we observe from our experiment dataset that about
25%

Activity (9)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
John Gury added this note
Doing it for finance is great but how about in French? Can can can can can can can can can can FRENCH CANCAN http://www.youtube.com/watch?v=G3li4h...
John Gury added this note
"One nice property of social snippets we did not make use of is theunderlying social networks, which we consider as a promising fu-ture direction." That is for sure
freejack2006 liked this
Billy Nousis liked this
Vijay Harrell liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->