You are on page 1of 40

Information

Retrieval and
Search Engines

CAI Dr. Ahmed El-Shaer


Ch 1 •Course information
❖ Time location: Lecture Monday, 11:00 – 1:00,
Lab: Wednesday, 9:00 – 11:00
❖ Instructor e-mail: ahshaer1@aast.edu
❖ Required text: Introduction to information retrieval _ Christopher D.
Manning _ 7th Edition.
❖ Grading:
– Homework Assignments: 10%
– 7th : 30%
– 12th : 20%
– Final: 40%

2
1
introduction
Course objectives

❑ Understand the concepts, techniques, and algorithms in modern search engines

and IR systems.

❑ Learn how to evaluate and improve retrieval systems.

❑ Explore advanced topics like web search algorithms and machine learning for IR.

DS414 information Retrieval & Search Engines 4


Definition of Information Retrieval

DS414 information Retrieval & Search Engines 5


Real-world of Information Retrieval

DS414 information Retrieval & Search Engines 6


Real-world of Information Retrieval
❑x

DS414 information Retrieval & Search Engines 7


Definition of Information Retrieval
❑x

DS414 information Retrieval & Search Engines 8


Applications of Information Retrieval
❑x

DS414 information Retrieval & Search Engines 9


Definition of Information
Retrieval

DS414 information Retrieval & Search Engines 10


DS414 information Retrieval & Search Engines 11
Definition of Information Retrieval

DS414 information Retrieval & Search Engines 12


Definition of Information Retrieval

DS414 information Retrieval & Search Engines 13


Historical Perspective of IR

DS414 information Retrieval & Search Engines 14


Definition of Information Retrieval

DS414 information Retrieval & Search Engines 15


Expert search
❑x

DS414 information Retrieval & Search Engines 16


Definition of Information Retrieval

DS414 information Retrieval & Search Engines 17


Historical Perspective of IR

DS414 information Retrieval & Search Engines 18


Historical Perspective of IR
❑x

DS414 information Retrieval & Search Engines 19


Introduction to IR
Search and Information Retrieval

Search on the Web1 is a daily activity for many


people throughout the world
Search and communication are most popular uses
of the computer

Applications involving search are everywhere


The field of computer science that is most involved
with R&D for search is information retrieval (IR)

DS414 information Retrieval & Search Engines 21


Information Retrieval nutshell

DS414 information Retrieval & Search Engines 22


Basics of IR

DS414 information Retrieval & Search Engines 23


Two main issues in IR

Effectiveness Efficiency

● need to find relevant documents ● need to find them quickly


● vast quantities of data (10’s
● needle in a haystack billions pages)
● thousands queries per second
● very different from relational (Google, ~40,000)
● data constantly changes, need to
DBs (SQL) keep up

DS414 information Retrieval & Search Engines 24


Main components of IR
❑x

DS414 information Retrieval & Search Engines 25


Main components of IR: Documents
❑ Document : the elements to be retrieved
❑ Unstructured nature
❑ Unique ID
❑ Examples:
➢ web-pages, emails, book, page, sentence, tweets

➢ photos, videos, musical pieces, code

➢ answers to questions

➢ product descriptions, advertisements

➢ people
DS414 information Retrieval & Search Engines 26
Main components of IR: Queries
❑ Free text to express user’s information need
❑ Same information need can be described by different queries such as:

● Are chatting Apps secure?


● Live chat protection
● Breaches in online chat
❑ Same query can represent different information needs
❑ Apple
❑ Jaguar

DS414 information Retrieval & Search Engines 27


Main components of IR: Queries –different forms

Web search Keywords, narrative, …………

Image search Keywords, sample image, …………

Music search Tunes, sound, singer………….

Scholar search
Author, title, book,…………
DS414 information Retrieval & Search Engines 28
Main components of IR: Relevance
❑At an abstract level, IR is about:
● does item d match query q? … or …
● is item d relevant to query q?

Relevance is a tricky notion


● will the user like it / click on it?
● will it help the user achieve a task? (satisfy information need)
● is it novel (not redundant)?

Relevance means similarity


● i.e. d,q share similar “meaning”
● about the same topic / subject / issue
DS414 information Retrieval & Search Engines 29
Information Need/Query/Relevance

Information need
• Topic about which the user desires to know more
• In the user’s mind!

Query
• What the user conveys to the computer
• Considered one representation of the information need

Relevance
• Document having a value with respect to the information need
DS414
• i.e., a document is relevant if it satisfies the information need
information Retrieval & Search Engines 30
What is the challenge in relevance?
❑ No clear semantics!
●“William Shakespeare”
•Author history’s? list of plays? a play by him?

❑ Inherent ambiguity of language!


●polysemy: “Apple”, “Jaguar”

❑ Relevance is highly subjective!


●Rel: yes/no, Rel: perfect/excellent/good/fair/bad

DS414 information Retrieval & Search Engines 31


Information Retrieval (IR) is …

DS414 information Retrieval & Search Engines 32


Information Retrieval

Primary focus of IR since the 50s has been on text


and documents
DS414 information Retrieval & Search Engines 33
Dimensions of IR
• IR is more than just text, and more than just web search
– although these are central
• People doing IR work with different media, different types
of search applications, and different tasks
• New applications increasingly involve new media
– e.g., video, photos, music, speech
• Like text, content is difficult to describe and compare
– text may be used to represent them (e.g. tags)
• IR approaches to search and evaluation are appropriate
DS414 information Retrieval & Search Engines 34
Big Issues in IR: Relevance

DS414 information Retrieval & Search Engines 35


IR and Search Engines
A search engine is the practical application of information
retrieval techniques to large scale text collections
Information Retrieval Search Engines

Performance
Relevance
-Efficient search and indexing
-Effective ranking
Incorporating new data
Evaluation
-Coverage and freshness
-Testing and
measuring Scalability
Information needs -Growing with data and users
-User interaction Adaptability
-Tuning for applications
Specific problems
-e.g. Spam
DS414 information Retrieval & Search Engines 36
Why Information Retrieval:

Information Overload:
“… The world produces between 1 and 2 exabytes(10 bytes)of unique information per year,
18

which is roughly 250 megabytesfor every man, woman, and child on earth. …“ (Lyman &
Hal 03)

DS414 information Retrieval & Search Engines 37


concepts of
IR

DS414 information Retrieval & Search Engines 38


core
concepts
of IR

DS414 information Retrieval & Search Engines 39


IR Applications
Information Retrieval: a gold mine of applications:

❑ Web Search

❑ Information Organization: text categorization; document clustering

❑ Information Recommendation by content or by collaborative information

❑ Information Extraction: deep analysis of the surface text data

❑ Question-Answering: find the answer directly

❑ Multimedia Information Retrieval: image, video

❑ Information Visualization: Let user understand the results in the best way

❑ ………………………..

DS414 information Retrieval & Search Engines 40

You might also like