You are on page 1of 1

CS4242 Tutorial 2: Social Media Analysis

1. The early IR systems treat each document as independent entity, and compute the similarities between the documents and the query without taking into consideration the link or relationships between documents. The current generation commercial systems (started with Google) incorporate link information, in the form of PageRank, into the computation of document relevance. In social media such as Twitter, the messages are not totally independent either. A message may be sent: (i) as a reply of another message, or (ii) in response to a message posted by a person he/she is following. Explain how you would make use of the information above to enhance the retrieval performance? Justify your solutions. Note: a) If two messages are related, their content may be used to reinforce each other. b) How do you establish that a message is of type (ii) above,

2.

The relationships between messages can also be used to enhance classification of social media messages. Explain how you would make use of relation information directly (by reinforcing the content of a related document) or indirectly to enhance classification.

3.

In Lectures 5 & 6, we discuss several techniques to harvest new evolving terms from the set of relevant documents. The new terms harvested should have high correlation with the messages found in the current time window, and have low correlation with those found in previous time windows. a) Outline at least two methods to extract evolving terms. b) Describe the architecture of working system that will continuously update the evolving keyword list based on the latest set of relevant documents.

You might also like