You are on page 1of 44

Measuring system

performance
The library
A system view

Environment

U
Inputs Outputs s
Transformational
e
energy process products
money services
r
materials s
personnel
information
System performance
measures

recall precision

relevance
Robert Taylor's four levels
of question formation
The actual but unexpressed need for
Q1 information (the visceral need)
The conscious, within-brain description
Q2 of the need (the conscious need)
The formal statement of the need
Q3 (the formalized need)
The question as presented to the infor-
Q4 mation system (the compromised need)
Taylor, Robert S. 1968. Question-negotiation and information seeking in
libraries. College & Research Libraries 29(3): 178-194 (May 1968).
System-defined relevance
"My feet are killing me."

find health AND feet

The health of the lumber


90% industry in terms of cubic feet
of lumber produced
Information retrieval process
Question
formulation
Relevancy
determination
System: Which documents
are relevant to the query?
User: Are these documents
relevant to my needs?
Defining relevance

System-defined User-defined
vs.
relevance relevance
Objective Subjective.
Often topical. Situational.
Does it match Is it useful?
the query?
User-defined relevance
"My feet are killing me."
The effect of lysergic acid diethylamide
ingestion on toenail fungus in cloned mice

Soothing remedies for aching feet

Controlling the body by controlling the mind--


meditative techniques for dealing with pain
Determining topical relevance
• Analyze work as to what it
is about
• Assign to the document
one or more terms from a
finite list of topics
• Users can then search on
those topic indicators
Recall

No. of relevant documents


retrieved
Recall =
Total no. of relevant
documents in the file
Precision

No. of relevant documents


retrieved
Precision =
Total no. of documents
retrieved from the file
Precision vs. Recall
An inverse relationship

As the level of recall rises the


level of precision generally
declines and vice versa.

The Cranfield experiments (1957 & 1962)


Cyril Cleverdon, p.i.
Precision vs. Recall
Subject: sexual dimorphism
Word stemming: Recall Precision

sex sexes sexual


sexy sexier sexiest

Field-specific searches: Recall Precision

DE,TI/sexual()dimorphism
User-defined relevance
"Relevance appears to be a
subjective quality, unique
between the individual and a
given document supporting
the assumption that
relevance can only be judged
by the information user."
Miranda Pao
Years later
"My feet are still killing me."
The effect of lysergic acid diethylamide
ingestion on toenail fungus in cloned mice

Soothing remedies for aching feet

Controlling the body by controlling the mind--


meditative techniques for dealing with pain
Factors affecting
relevance (1)
• Purpose of the information
• Situation of the user
• Level at which the information
source is written
– Journal of the Amer. Med. Assn.
– Healthy times
Factors affecting
relevance (2)
• Subject knowledge of the user
– Is the data new to the user?
– Does the information relate to the
user's prior knowledge?
• Values - ethical, social,
philosophical, political, religious,
legal
User-defined relevance

Subjectivity and fluidity make it


difficult to use as measuring tool
for system performance
Incorporating user-defined
relevance into information
retrieval systems (1)

• User performs search


• System retrieves results
.
.
.
Incorporating user-defined
relevance into information
retrieval systems (2)
• System asks user if he/she would
like to retrieve similar documents
Search for other documents with
similar word frequencies
Search for other documents with
same subject descriptors
Search for other documents
with same subject
descriptors
Main Author: Gribbin, John R.
Title: In search of Schrodinger's cat :
quantum physics and reality /
by John Gribbin.

Subject(s): Schrodinger, Erwin, 1887-1961.


Quantum theory History.
Reality.
Amazon.com
Amazon.com
Amazon.com
Assisting users in
determining relevancy

Title Abstract

Indexing Citation
terms data
Source: Barry, Carol L. 1998. Document representations and clues to document
relevance. Journal of the American Society for Information Science 49(14):1293-1303.
How
Document representation relevant
research are these?

Title: Getting good Title: How to impress


grades in your advisor in
Titles graduate school graduate school

Title: Writing a Title: The well-written


dissertation graduate paper

Getting good grades in How to impress your advisor


graduate school in graduate school
Full The best way to get good Never show up late for a
grades is to study hard… meeting with your advisor…
text
Writing a dissertation The well-written graduate
How paper
The first thing to do is to
relevant pick a topic that truly Before finalizing your topic do
are these? interests you… a preliminary search on…
Document representation
research How
relevant
are these?
Citation Indexing
Titles Abstracts
data terms

Full Full Full Full


text text text text
How
relevant
are these?
Utility studies - Indications that
user found relevant materials
• Citation & abstract databases
– User requests citations be formatted for
printing
– User requests citations be sent by e-mail
– User downloads citations
• Full-text databases
– Pull up the full text
– Print the article
– Download the article to their Blackberry
Utility studies - Indications that
If user stops may
not have found a
user found relevantrelevant
materials
article

Short
Search
list

chocolate
Utility studies - Indications that
user found relevant materials
Short Modifies
Search
list search

View full
View full Download
citation
text of or print
data for
article article
article
Assume that user
found article
relevant
Characteristics of searches
that produce relevant
materials
• Subject searching
• Utilization of Boolean operators
• Search modification
• Increased time in display activities
• User of greater number of
databases
Cooper, Michael Dr. and Hui-Min Chen. 2001. Predicting the relevance of a library catalog search.
Journal of the American Society for Information Science and Technology 52 (10):813-827.
Importance of abstract (1)
• Indication as to depth/scope of
the article Authors studied leg-hair count
variations of Drosophila in
Kawainui Marsh
• Delineates methodology--
indication of reliability and
validity Random sampling in 40
sectors during March, June,
September & December
• Gives indication as to content
novelty Greater variation in June
Importance of abstract (2)
• Basis for research may
indicate recency
American housing market was
selected because it is always
robust.
• Delineation of results
indicates "tangibility"
(important, useful data)
Authors concluded that
American teenagers listen to
rock music.
Types of abstracts

• Indicative
• Informative
• Critical (evaluative)

(Not common in
library databases)
Indicative abstract
Indicates what the document is about but
doesn't report findings

Title: A review of the current


literature on relevance.

Abstract: The author reviews the


current literature on relevance.
Informative abstract
Acts as a substitute for the document
Title: The effects of library school on
the mental health of library students

Abstract: The authors performed


longitudinal studies on 32 graduate
students in 8 library and information
science programs and found a
significant increase in aberrant
psychological traits over time.
(fictitious title and abstracts)
Abstract creation

• Author-produced
• Vendor-added
• Automated abstracting
Automated abstracting
1. Word counts
2. Remove stop words
3. Weight remaining words
according to frequency
4. Search for sentences with
highest density of most
frequently-occurring words
1. Word count
Title: Seasonal variations in the feral cat
population of Fargo

the 81 summer 11 average 9


is 68 spring 11 concept 7
a 56 fall 11 per 8
to 42 monthly 10 over 9
cats 61 temperature 61 immediate 5
number 45 variation 12 implement 3
season 27 food 10 mortality 8
winter 11 availability 10 survival 9
2. Eliminate stop words
Title: Seasonal variations in the feral cat
population of Fargo

the 81 summer 11 average 9


is 68 spring 11 concept 7
a 56 fall 11 per 8
to 42 monthly 10 over 9
cats 61 temperature 61 immediate 5
number 45 variation 12 implement 3
season 27 food 10 mortality 8
winter 11 availability 10 survival 9
3. Rank by frequency
Title: Seasonal variations in the feral cat
population of Fargo

cats 61 summer 11 average 9


temperature 61 spring 11 survival 9
number 45 fall 11 mortality 8
seasonal 27 monthly 10 concept 7
variation 12 food 10 immediate 5
winter 11 availability 10 implement 3
4. Search for sentences with
highest density of high
frequency words
Title: Seasonal variations in the feral cat
population of Fargo

We found a significant seasonal variation in


the number of cats.

The highest number of cats are found in the


summer, the lowest number of cats in the
winter.
Automated abstract
... The Children's Internet Protection Act (CIPA) sets
conditions on public libraries' receipt of federal financial
assistance for Internet access. ... It would not have been
possible for the broadcasting station to limit the use of
federal funds to all non-editorializing activities. ... The
instant Court distinguished Velazquez, restricting its
holding to situations in which the grantee is "pit[ted] . . .
against the Government. ... " Justice Stevens asserted
that the filtering condition was unconstitutional because
it distorted the normal usage of library Internet terminals
as sources of a wide array of information. ... A condition
mandating Internet filters distorts this mission by
"deny[ing] patrons access to constitutionally protected
speech that libraries would otherwise provide. ...
Relevance and
information overload
In this age of information
overload, tools to aid the user
in determining relevance are
increasingly critical.

You might also like