You are on page 1of 26

Intelligent Information Retrieval

using

Tarun Kumar Jaiswal


C.S. – IV year

1
Information Retrieval
(IR)
• The indexing and retrieval of textual
documents.
• Searching for pages on the World Wide
• Concerned firstly with retrieving relevant
documents to a query.
• Concerned secondly with retrieving from
large sets of documents efficiently.

2
Typical IR Task

• Given:
– A corpus of textual natural-language
documents.
– A user query in the form of a textual string.
• Find:
– A ranked set of documents that are relevant to
the query.

3
IR System

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Documents .
.

4
Relevance

• Relevance is a subjective judgment and may


include:
– Being on the proper subject.
– Being timely (recent information).
– Being authoritative (from a trusted source).
– Satisfying the goals of the user and his/her
intended use of the information (information
need).

5
Keyword Search

• Simplest notion of relevance is that the


query string appears verbatim in the
document.
• Slightly less strict notion is that the words
in the query appear frequently in the
document, in any order (bag of words).

6
Problems with Keywords

• May not retrieve relevant documents that


include synonymous terms.
– “restaurant” vs. “café”
– “PRC” vs. “China”
• May retrieve irrelevant documents that
include ambiguous terms.
– “bat” (baseball vs. mammal)
– “Apple” (company vs. fruit)
– “bit” (unit of data vs. act of eating)
7
Intelligent IR

• Taking into account the meaning of the


words used.
• Taking into account the order of words in
the query.
• Adapting to the user based on direct or
indirect feedback.
• Taking into account the authority of the
source.

8
Web Search System

Web Spider Document


corpus

Query IR
String System

1. Page1
2. Page2
3. Page3
Ranked
. Documents
.

9
Google’s Mission
• Googol=the number 1 followed by one
hundred zeroes.
• Organize the immense amount of
information available on the web.
• There are some 100 billion items on the
Internet.
• Google currently indexes 8,058,044,651
web pages
• Often in less than half a second.

10
Crash Course in Advanced Operators

11
Advanced Google Searching

12
13
Advanced Search Domain
• that contain ALL the search terms you type in
• that contain the exact phrase you type in
• that contain at least one of the words you type in
• that do NOT contain any of the words you type in
• written in a certain language
• created in a certain file format
• that have been updated within a certain period of time
• that contain numbers within a certain range
• within a certain domain, or website
• that are available for anyone to use, share or modify, even
commercially
• that don't contain "adult" material

14
15
"+" search
• Google ignores common words and characters such as
where, the, how, and other digits and letters which slow
down your search without improving the results. Google
indicates if a word has been excluded by displaying details
on the results page below the search box.
• If a common word is essential to getting the results you
want, you can include it by putting a "+" sign in front of it.
(Be sure to include a space before the "+" sign.)
• For example, here's how to ensure that Google includes the
"I" in a search for Star Wars, Episode I:

16
Synonym search

• If you want to search not only for your


search term but also for its synonyms, place
the tilde sign ("~") immediately in front of
your search term.
• For example, here's how to search for food
facts and nutrition and cooking information:

17
"OR" search

• To find pages that include either of two


search terms, add an uppercase OR between
the terms.
• For example, here's how to search for a
vacation in either London or Paris:

18
Domain search

• You can use Google to search only within


one specific website by entering the search
terms you're looking for, followed by the
word "site" and a colon followed by the
domain name.
• For example, here's how you'd find
admission information on the Stanford
University site:

19
Numrange search
• The numrange operator searches for results containing
numbers in a given range. You can use Numrange to set
ranges for everything from dates ( Willie Mays
1950..1960) to weights ( 5000..10000 kg truck). Just add
two numbers, separated by two periods, with no spaces,
into the search box along with your search terms, and
specify a unit of measurement or some other indicator of
what the number range represents.
• For example, here's how you'd search for a DVD player
that costs between $50 and $100:

20
Fill in the blanks "*" search

• Sometimes the best way to ask a question is


to get Google to 'fill in the blank' for you.
You can do this by adding an asterisk "*" in
the part of the sentence or question that you
want filled in.
• For example, here's how you'd search for
who invented the parachute:

21
Operator Examples

22
23
Live Examples : Search

24
Live Examples : Special Searches

25
That’s all for now!
Happy Googling!!

Regards
Tarun Kumar Jaiswal
jaiswal.tarun@gmail.com
26