You are on page 1of 36

Information Retrieval(SEO)

Content

 Information Retrieval.
 Four key elements.
 Two main Issues in IR.
 Important Components in IR.
 IR is NOT DB.
 IR is NOT NLP.
 Bag-of-Words.
 Boolean Retrieval Model.
 Term-Document Incidence Matrix.
Conversational Search
IR is a Technology to Connect People to Information
Information Retrieval

• An information retrieval (IR) system:

 is a set of algorithms that facilitate the relevance of


displayed documents to searched queries.

 In simple words, it works to sort and rank documents


based on the queries of a user.

 There is uniformity with respect to the query and text in


the document to enable document accessibility.
Searching in videos is IR?

Spam filtering is IR?


Information Retrieval
Two main Issues in IR

Effectiveness

• need to find relevant documents.

Efficiency

• Need to find documents quickly.


Information Retrieval
four key elements

1 D − Document Representation.

2 Q − Query Representation.

3 F − A framework to match and establish a relationship between


D and Q.

R (q, di) − A ranking function that determines the similarity between the query and the
4 document to display relevant information.
Important Components in IR

1 Document
The element to be retrieved (not only text).
• Unique ID.
• Unstructured.
• N doc  collection.

 Examples:
Web pages, book, e-mails, page, sentences.
Photos, videos, code.
People.
Important Components in IR
2 Query:
 Free text to express user’s information need.
 We translate information need to query.
 User queries can either be formal or informal statements highlighting what information is
required.
 Same information can be described by different queries.
 Are chatting apps secure?
 Live chat protection.

Same queries can represent different information needs.


Apple.
Same information can be described by different
queries.
Query- Different Forms

 Web search.
 Image search.
 Music.
 Question.
Important Components in IR

3 Relevance

 Does item d match query q?


 Is item d relevant to query q?

 Relevance is a tricky notion


• Will the user like it?
• Will it help user achieve a task?
Information Need/ Query/ Relevance

 Information need
 Topic about which the user desires to know more.

 Query
 Representation of the information need.

 Relevance
 Document having a value with respect to the information need.
IR is NOT DB
IR is NOT NLP

 NLP:
processing and analysis of natural language.

“IR makes NLP useful,


NLP makes IR interesting”
Bag-of-Words Tricks

 Can you guess what this is about?

 Per is salary hour $560 John’s.


 John’s salary per hour is $560.
 Main idea:
Re-ordering doesn’t destroy the topic.

 Individual words are “building blocks”.


Bag of Words Tricks

 Most search engines use “BOW”


 Treat documents and queries as bags of words.
 A Bag: is a set with repetitions.
 Match: degree of overlap between d, q.
Used in Retrieval Models:
 Statistical models: usually use words as features.
 Decide which documents most likely to be relevant.
BOW makes these models traceable and effective.
How IR Sees Documents?
Information Retrieval Models

Boolean Retrieval Model.


Boolean Retrieval Model

User expresses queries as Boolean Expression.


AND, OR, NOT.
It is a simple retrieval model.

Ex: information AND retrieval AND NOT technology.


Term-Document Incidence Matrix

Rows are terms.


Columns are documents.

 Example:
 He likes to wink, he likes to drink.
 He likes to drink, and drink, and drink.
 The thing he likes to drink is ink.
 The ink he likes to drink is pink.
 He likes to wink, and drink pink pink.
Term-Document Incidence Matrix
Term-Document Incidence Matrix
Term-Document Incidence Matrix
Term-Document Incidence Matrix
Term-Document Incidence Matrix
Boolean Retrieval Model

Any given query divides the collection of documents into two sets:
 Retrieved(matching).
 Not-retrieved(not matching).
Returns a set of documents that exactly satisfy the query (Boolean
Expression).
Called “Exact-Match” retrieval.
Used??
Email, library catalog.

You might also like