You are on page 1of 41

2012

Personalized Topic-Based Tag Recommendation
PRESENTED BY: MOSTAFA HEIDARY

Ralf Krestel , Peter Fankhauser

Keywords: Tag recommendation Personalization Language models Topic models

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

What is Folksonomy?[1]
 A folksonomy is a system of classification derived from

the practice and method of collaboratively creating and managing tags to annotate and categorize content  Folksonomy, a term coined by Thomas Vander Wal, is a portmanteau of folk and taxonomy.[2]  If you want to create a folksonomy and associated tag cloud, you can set up a free account in a matter of minutes at delicious or Diigo.

1- Folksonomy definition on Wikipedia: http://en.wikipedia.org/wiki/Folksonomy 2- [Van 2007] Vander Wal, T.: Folksonomy Coinage and Definition. Vanderwal.net, 2007.
M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Topic Modeling

 Topic

modeling provides methods for automatically organizing, understanding, searching, and summarizing large electronic archives.
1. 2. 3.

Uncover the hidden topical patterns that pervade the collection. Annotate the documents according to those topics. Use the annotations to organize, summarize, and search the texts.
03:17

M. Heidary

Model the evolution of topics over time
"Theoretical Physics" "Neuroscience"

M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Introduction to LDA
 Latent Dirichlet Allocation (LDA) is a common

method of topic modeling.  The general idea:

based on the hypothesis that a person writing a document has certain topics in mind. To write about a topic then means to pick a word with a certain probability from the pool of words of that topic. A whole document can then be represented as a mixture of different topics

 LDA helps to explain the similarity of data by

grouping features of this data into unobserved sets
M. Heidary

03:17

Introduction to LDA
 Suppose you have the following set of sentences:  I like to eat broccoli and bananas.  I ate a banana and spinach smoothie for breakfast.  Chinchillas and kittens are cute.  My sister adopted a kitten yesterday.  Look at this cute hamster munching on a piece of broccoli.  What is latent Dirichlet allocation?  It‟s a way of automatically discovering topics that these sentences contain.

M. Heidary

03:17

Introduction to LDA
1. 2.

3.
4. 5.

I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli.

Sentences 1 and 2: 100% Topic A Sentences 3 and 4: 100% Topic B Sentence 5: 60% Topic A, 40% Topic B Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … (at which point, you could interpret topic A to be about food)  Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, … (at which point, you could interpret topic B to be about cute animals)
   
M. Heidary

03:17

LDA Model
 It assumes that documents are produced in the

following fashion: when writing each document:
  

Decide on the number of words N the document will have Choose a topic mixture for the documents. Generate each word w_i in the document by:
First picking a topic (according to the multinomial distribution that you sampled above).  Using the topic to generate the word itself (according to the topic‟s multinomial distribution). For example, if we selected the food topic, we might generate the word “broccoli” with 30% probability, “bananas” with 15% probability, and so on.

M. Heidary

03:17

LDA Model
 Example:
   


 

Pick 5 to be the number of words in D Decide that D will be 1/2 about food and 1/2 about cute animals Pick the first word to come from the food topic, which then gives you the word “broccoli” Pick the second word to come from the cute animals topic, which gives you “panda” Pick the third word to come from the cute animals topic, giving you “adorable” Pick the fourth word to come from the food topic, giving you “cherries” Pick the fifth word to come from the food topic, giving you “eating”

 So the document generated under the LDA model will be

“broccoli panda adorable cherries eating” (note that LDA is a bag-of-words model).

M. Heidary

03:17

Learning

M. Heidary

03:17

 In reality, we only observe the documents, our goal is to infer the

underlying structure.

M. Heidary

03:17

Learning
 K: some fixed number of topics.  Go through each document, and randomly assign each word in the document to one of the K topics.  Notice that this random assignment already gives you both topic representations of all the documents and word distributions of all the topics (albeit not very good ones).  So to improve on them, for each document d…

Go through each word w in d…

M. Heidary

03:17

Learning
 And for each topic t, compute two things: 1. p(topic t | document d) = is the probability of picking a term from topic t in the document d. 2. p(word w | topic t) = is the probability of w within topic t.  in this step, we‟re assuming that all topic

assignments except for the current word in question are correct, and then updating and Reassign w a new topic, where we choose topic t with probability p(topic t | document d) * p(word w | topic t) (this is
essentially the probability that topic t generated word w)

M. Heidary

03:17

Learning
 After repeating the previous step a large number of

times, you‟ll eventually reach a roughly steady state where your assignments are pretty good.  So use these assignments to estimate the topic mixtures of each document. and the words associated to each topic

M. Heidary

03:17

topic1

topic2

topic3

topic4

M. Heidary

03:17

Learning

 P(ti | d) is the probability of the ith term for a given

document d and zi is the latent topic.  P(ti | zi = j) is the probability of ti within topic j.  P(zi = j | d) is the probability of picking a term from topic j in the document  LDA estimates the topic-term distribution P(t | z) and the document-topic distribution P(z | d) from an unlabeled corpus of documents using Dirichlet priors for the distributions and a fixed number of topics.
M. Heidary

03:17

Learning

 Gibbs sampling is one possible approach to this end: It

   

iterates multiple times over each term ti in document di, and samples a new topic j for the term based on the aforementioned probability. CTZ maintains a count of all topic-term assignments. CDZ counts the document-topic assignments Z-i represents all topic-term and document-topic assignments except the current assignment zi for term ti α and β are the (symmetric) hyperparameters for the Dirichlet priors, serving as smoothing parameters for the counts
03:17

M. Heidary

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Using LDA for Tag Recommendation
 For tagging systems the documents are resources r ∈

R, and each resource is described by tags t ∈ T assigned by users u ∈ U.  Instead of documents composed of terms, we have resources composed of tags  To build an LDA model we need resources and associated tags previously assigned by users.

M. Heidary

03:17

Example of using AR

M. Heidary

03:17

Example of using LDA

Top terms composing the latent topics “photography” and “howto”
M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Personalized tag recommendation using LDA
 We need to rank possible tags t, given a resource and

a user.  P(t) can be estimated via the relative frequency of tag t in all bookmarks.  We use simple language models, on the other hand, we use Latent Dirichlet Allocation, also in order to recommend tags for new resources and users, which have only few bookmarks available .

M. Heidary

03:17

Language Model

 where c(t,r) is the count of tag t in resource r.  Plm(t,u) of a user u using tag t is determined in a

similar way from all tags the user has assigned.  For new resources and users, having a few bookmarks available, the simple language model does not suffice for tag recommendation.

M. Heidary

03:17

Latent Dirichlet Allocation(LDA)

 The estimation of Plda(t,u) proceeds in the same way

as the estimation of Plda(t,r) by operating on the individual tag sets of users rather than resources.

M. Heidary

03:17

based on resource profiles

Top tags composing the latent topics „„tech news‟‟ and „„Flickr‟‟ based on resource profiles.

M. Heidary

03:17

based on user profiles

Top tags composing the latent topics „„mac‟‟ and „„do it yourself‟‟ based on user profiles.
M. Heidary

03:17

Combining LM and LDA

 We have experimented with a broad range for λ, and

achieved consistently good results for λ in the range of [0.2 … 0.8]

M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Evaluation

Results for one known bookmark and different algorithms on the Delicious dataset.
M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

Conclusion
 we have explored user-centered and resource-

centered approaches for personalized tag recommendation.  We compared and employed a language modeling approach and an approach based on Latent Dirichlet Allocation.  Even for non-textual resources like videos or audio, additional metadata could be exploited.

M. Heidary

03:17

Outline
 Introduction

 What is Folksonomy?
 Topic Modeling  Introduction to LDA

 Using LDA for Tag Recommendation
 Personalized tag recommendation using LDA  Evaluation

 Conclusion
 What plan do I have?
M. Heidary

03:17

What plan do I have?
 First, implement this approach with c#.  Implementation of LM is simple,  I use Some LDA implementation available at web.  And I use some Farsi datasets, like news ones.  I plan to consider tag and time information, to

recommend appropriate tags over the time.  The Article “N. Zheng and Q. Li, "A recommender

system based on tag and time information for social tagging systems," Expert Systems with Applications, vol. 38, no. 4, pp. 4575-4587, Apr. 2011.” will

be useful for this approach.
M. Heidary

03:17

Thanks
 Any Question?

M. Heidary

03:17