Professional Documents
Culture Documents
VIDEOS RETRIEVAL
Language Translation Technology
Recommender Systems
Online Social Networks
Product Review Sites
Email Spam Detection
Search Results Clustering
Search Query Suggestions
Many Other Tasks …
• Novelty Detection
• Book Detection
• Blog Recommendation
• Plagiarism Detection
• Opinion Detection
• Author Profiling
• etc etc etc
An Example of IR Problem
• How to Search in a Document Grepping
• Challenges for Grepping
– Too large data collections
– Query, “Pakistan NEAR Zardari” not practical
with grepping
– Not suitable for Ranked Retrieval
Query
Boolean IR
• Query terms combined with AND, OR & NOT
• Ad-HOC IR
– In adhoc IR, system aims to provide documents from withtin the
collection that are relevant to an arbitrary users information need,
communicated to the system by means of a one-off, user initiated
query.
• Information Need / Query
• Relevance
• Effectiveness (the quality of search results)
– Precision … Recall
Some Concepts
• Term–Document Incidence Matrix : Extremely
sparse i.e. very few non-zero entries. A much better
representation is to record only the things that do occur,
• Inverted Index