Professional Documents
Culture Documents
(IR):
traditional model
1. Why? Rationale for the
module. Definition of IR
2. System & user components
3. Exact match & best match
searches
4. Strengths & weaknesses
Tefko Saracevic
Tefko Saracevic
Why?
Every online database, every
search engine, everything that is
searched online is based in some
way or another on principles
developed in IR
IR is at the heart of searching used in
systems such as DIALOG, LexisNexis
& others
Tefko Sarace
Tefko Sarace
IR:
- original definition
Information retrieval embraces the
intellectual aspects of the
description of information and its
specification for search, and also
whatever systems, techniques, or
machines are employed to carry
out the operation.
Calvin Mooers, 1951
Tefko Sarace
IR:
Objective & problems
Provide the users with effective
access to & interaction with
information resources.
Problems addressed:
1. How to organize information
intellectually?
2. How to specify search &
interaction intellectually?
3. What systems & techniques to
use for those processes?
Where do you fit?
With what problems do you deal?
Tefko Sarace
Tefko Saracevic
IR models
Model depicts, represents what is
involved
a choice of features, processes, things
for consideration
Tefko Sarace
Description of
traditional IR model
It has two streams of activities
one is the systems side with processes
performed by the system
other is the user side with processes
performed by users & intermediaries (you)
these two sides led to system orientation &
user orientation
in system side automatic processing is done;
in user side human processing is done
Tefko Sarace
Traditional IR model
System
User
Acquisition
Problem
documents, objects
information need
Representation
Representation
indexing, ...
question
File organization
Query
search formulation
Matching
searching
feedba
ck
indexed documents
Retrieved objects
Tefko Sarace
10
Acquisition
(system)
Content: What is in files, resources
in DIALOG first part of blue sheets: File
Description, Subject Coverage
Importance:
Determines contents what
is in it
Key to file, resource
selection !!!
Tefko Sarace
11
Representation
of documents, objects
(system)
Indexing many ways :
free text terms (even in full texts)
controlled vocabulary - thesaurus
manual & automatic techniques
Abstracting; summarizing
Bibliographic description:
author, title, sources, date
metadata
Classifying, clustering
Organizing in fields & limits
in DIALOG: Basic Index, Additional Index.
Limits
12
File organization
(system)
Sequential
record (document) by record
Inverted
term by term; list of records under
each term
13
Problem
(user)
Related to users task, situation
vary in specificity, clarity
14
Representation - question
( user & possibly system)
Non-mediated: end user alone
Mediated: intermediary + user
interviews; human-human interaction
Question analysis
selection, elaboration of terms
various tools may be used
thesaurus, classification schemes,
dictionaries, textbooks, catalogs
Focus toward
deriving search terms & logic
selection of files, resources
15
16
Clarifying difference
Question is what user asks and what
you may then have elaborated
Query is what is asked of computer to
match what is put in
Question is transformed into query
Question:
I am interested in major historical
developments in the area of information
retrieval?
Query
history information retrieval (in Google)
history AND information(w)retrieval (in
DIALOG) (plus you have to select which
file(s) to search)
Tefko Sarace
17
Matching - searching
(user & system)
Process of matching, comparing
search: what documents in the file
match the query as stated?
18
Retrieved documents
(from system to user)
Various order of output:
Last In First Out (LIFO); sorted
ranked by relevance
ranked by other characteristics
19
Tefko Saracevic
20
Tefko Sarace
21
Boolean algebra
Operates on sets
Tefko Sarace
22
Potential problems
But beware:
digital AND library will retrieve documents
that have digital library (together as a
phrase) but also documents that have
digital in the first paragraph and library in
the third section, 5 pages later, and it
does not deal with digital libraries at all
thus in Google you will ask for digital
library and in DIALOG for
digital(w)library to retrieve the exact
phrase digital library
digital NOT library will retrieve documents
that have digital and suppress those that
along with digital also have library, but
sometimes those suppressed may very
well be relevant. Thus, NOT is also
known as the dangerous operator
Tefko Sarace
23
B
2
A
1 2
B
3
A AND B. Shade 2
digital AND libraies
A
1 2
B
3
A OR B. Shade 1, 2, 3
digital OR libraries
A
1 2
B
3
Tefko Sarace
A NOT B. Shade 1
digital NOT libraries
24
B
2
1
4
3
6
(A OR B) AND C
Shade 4,5,6
(digital OR libraries) AND
Rutgers
C
(A OR B) NOT C
Shade what?
(digital OR libraries) NOT
Rutgers
Tefko Sarace
25
Tefko Sarace
26
Tefko Sarace
27
cont.
Tefko Sarace
28
29
4. Strengths &
weaknesses
Tefko Saracevic
30
Tefko Sarace
Best match
allows for free
terminology
provides for a
ranked output
provides for cut-off
- any size output
BUT
does not include
logic
ranking method
(algorithm) not
transparent
whose
relevance?
31
Strengths of traditional
IR model
Lists major components in both
system & user branches
Suggests:
What to explain to users about
system, if needed
What to ask of users for more
effective searching (problem ...)
Tefko Sarace
32
Weaknesses
Does not address nor account for
interaction & judgment of results
by users
identifies interaction with search only
interaction is a much richer process
33
Interactive models
Explored in next module
Module 5
Tefko Sarace
34