Professional Documents
Culture Documents
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 2
2023
Session Outline
The key topics to be covered in the session are:
January 24,
Dr Mrs Florence O. Entsua-Mensah 3
2023
Recommended Reading
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 4
2023
Understanding information retrieval
Topic One
• IR includes:
• Web search
• Searching your laptop
• Searching large cooperate databases
January 24,
Dr Mrs Florence O. Entsua-Mensah 6
2023
Definition of IR
Information retrieval
➢ The technique & process of searching, recovering, &
interpreting information from large amounts of stored
data (MScience & Technology Dictionary).
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 7
2023
Basic assumptions of IR
• Collection: A set of documents
– Assume it is a static collection for the moment
• Goal: Retrieve documents with information that is relevant
to the user’s information need and helps the user complete
a task.
• Precision : Fraction of retrieved documents that are
relevant to the user’s information need.
• Recall : Fraction of relevant documents in a collection that
are retrieved.
January 24,
Dr Mrs Florence O. Entsua-Mensah 8
2023
Information Retrieval System (IRS)
Topic Two (2)
January 24,
Dr Mrs Florence O. Entsua-Mensah 11
2023
The purpose of an IR system
• To organize documents/records in a way that facilitates
easy access or retrieval of relevant information by its
users.
• IR systems retrieve bibliographic items or exact match of
texts of queries from full text databases or multimedia
information.
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 13
2023
Elements of an IRS
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 16
2023
Types of IR systems
1. OPAC
➢ Searching library catalogues online
➢ checking availability of library resource.
2. Online database
➢ Provide access to peer reviewed scholarly information resources
➢ Are subscription or fee-based services
3. Digital libraries and web information service
➢ Information is stored in digital formats
➢ Often free and accessed via the web
4. Web search engines
➢ Free search tools for web information retrieval
January 24,
Dr Mrs Florence O. Entsua-Mensah Slide 17
2023
Everyday uses of IR systems
• The search for information • Searching for information
from library OPACs on company or institutional
• Accessing information intranets
from bibliographic or full • Access web information
text databases e.g. Web of via URLs, search engines,
Science, LISA and subject gateways
• Access to e-books & e- (provide links to more academic, reliable
information).
journals (World public library at
http://worldlibrary.net/ , Emerald at
• Access information from
www.emeraldinsight.com)
social networking sites.
• Access information from
email services and mobile
phones
January 24,
Dr Mrs Florence O. Entsua-Mensah 19
2023
Activity 1.1
• Discuss the core elements of an information retrieval
system.
January 24,
Dr Mrs Florence O. Entsua-Mensah 20
2023
References
January 24,
Dr Mrs Florence O. Entsua-Mensah 21
2023
INFS 427: AUTOMATED INFORMATION RETRIEVAL
(1st Semester, 2021/2022)
Lection 2:
Historical Developments in AIR
• We will discuss the various types of data and how they are
organized to enhance the information retrieval process.
Topic Three
(Sobrino, 2014)
• Document Clustering:
• The task of organizing a collection of documents, whose classification is
unknown, into meaningful groups (clusters) that are homogeneous
according to some notion of proximity (distance or similarity) among
documents (Tagarelli , 2009).
“tables”
Student ID NAME GENDER SCORE
10901582 Reggie Male 60
RECORD
• There are times when you search the UGCat and you have ‘no’
results.
• This simply means that your query/ search yielded no results – the
search engine could not match your search to the available
collection/database.
• In situation like this, you may have to reformulate your query or
search a deferent database.
reference source
databases databases
Author Price
Title of book Call number
Publisher’s name Accession number
Place of publication Keywords
Date of publication
Session 04
SUBJECT ANALYSIS & REPRESENTATION
Conceptual
Representation
analysis
what keyword
what is the
can best
subject of this
represent this
document?
document?
What is it about?
(Bastida, 2016)
(Taylor, 2009)
(Taylor, 2009)
Dr. (Mrs.) F. O. Entsua-mensah,
DIS/SCDE Slide 12
SA Cont’d
(Robare, 2004)
Dr. (Mrs.) F. O. Entsua-mensah,
DIS/SCDE Slide 13
Perspectives of subject analysis
Subject analysis is used in two ways in the library and information
science (LIS) literature
1. Relates to construction of indexing language and classification
systems.
2. Relates to the analysis of the topical content of a document
(which is our focus).
Thus, it determines the essence or the subject matter in
document texts, databases, controlled and natural
languages, information requests, and search strategies.
(Robare, 2004)
(Robare, 2004)
• Controlled vocabulary
• Thesauri (examples)
• Art & Architecture Thesaurus (AAT)
• Thesaurus of ERIC Descriptors
• Subject heading lists (examples)
• Library of Congress Subject Headings
• Sears List of Subject Headings
• Medical Subject Headings (MeSH)
(Robare, 2004)
(Robare, 2004)
• Advantages:
• provide access to the words used in bibliographic
records
• Disadvantages:
• cannot compensate for complexities of language and
expression
• cannot compensate for context
• Keyword searching is enhanced by assignment of
controlled vocabulary!
(Robare, 2004)
(Robare, 2004)
(Robare, 2004)
• Create a list of key words and concepts that would be translated into a
controlled vocabulary.
Lecture 5:
BIBLIOGRAPHIC FORMATS
Lecturer:
Dr. Mrs. Florence O. Entsua-Mensah (fentsua-mensah@ug.edu.gh)
Session Outline
The key topics to be covered in the session are:
• Bibliographic Formats
• The ISO 2709
• The MARC Format
(Chowdhury, 2010).
(Chowdhury, 2010).
OR
(ISO, 2016)
(ISO, 2016)
Dr. Mrs. Florence O. Entsua-Mensah 21
Characteristics of the ISO 2709
• The length of the tag shall be three octets. The length in octets of
the other three parts in each directory entry shall be given by the
directory map (octets 20 to 22 in the record label).
• All elements in a directory shall have the same structure.
(Galabova , Trencheva & Trenchev, 2009)
Reference fields-
• Are used to hold reference data of a given record that may
be required for processing.
Field separators
• Each data field is terminated with a field separator symbol.
2 characters
• It follows the field separator of the final data field of the record.
Statement of responsibility:
Topic One
(Xie, 2009)
Source: https://uva.libguides.com/searching_information
(Cleverdon, 1988)
(Cleverdon, 1988)
(Xie, 2009)
(Kuhlthau, 2004)
Topic Two
(Xie, 2009)
(Xie, 2009)
Dr. F. O. Entsua-Mensah (Mrs) Slide 18
Search Strategies
Topic Three
Topic 4
Adolescents
Teenagers
• Use OR to expand
your search.
restrictive set
(Khurana, 2014)
• The search engine visits (every) web page that it can find, it
builds an index which is a list of tokens such as words that
are associated with pages (images, audio, video, etc.).
• This feature of the search engine is the CRAWLER.
• Google
• Bing
• Yahoo
• Ask.com
• AOL.com
• Baidu
• DuckDuckGo
• Yandex
Topic One
(Chowdhury, 2010)
(Thakkar, 2015)
(Thakkar, 2015)
(Thakkar, 2015)
Topic One
Topic Two
(Xie, 2009)
(Xie, 2009)
Florence O. Entsua-Mensah (Mrs) 12
The WWW & Web Information Retrieval
Topic 3
(Chowdhury, 2010)
Quality of information
• Quality of web information is uncertain since anyone can publish on
the web. Text retrieval system comprise published information
resources with definite quality control.
Frequency of changes
• Web information changes frequently. Contents of text retrieval
systems are static and thus easy to track and retrieved by a
retrieval system.
Ownership
• ownership of web resources varies, some are free, others require
permission or access rights, posing a challenge to retrieval.
Differences between traditional and web retrieval
Session 9
• The term "user" can refer to any person who interacts with
an information system to search for and select resources
he/she needs.
(UNT, 2017)
(UNT, 2017)
• Users of IRS fall into two major categories that are non-
mutually exclusive:
1. Those who develop and evaluate IR systems and
services.
2. Those who consume them.
• The former are researchers and developers in disciplines
such as computing and information sciences, while the
latter are everyday users of the technology.
Actual users:
• those who are using the information service at a given time.
Potential Users:
• those who are not yet served by the information services.
Expected Users:
• those who not only have the privilege of using the information
service, but also have the intention of doing so.
Beneficiary users:
• Users who have derived some benefits form the information service.
(Chowdhury, 2010)
(UNT, 2017)
(UNT, 2017)
Session 10:
USER SEARCH INTERFACE
(Hearst, 2009)
(Hearst, 2009)
(Hearst, 2009)
Topic 3
(Hearst, 2009)
Informative Document
Surrogates
Suggestions
Show Query Term Suggestion - 1
Query Term
Suggestions
Query
Results Ordering
Transformations
Results Ordering
Query Transformation
• Some search engines make subtle changes to queries to improve
results. Query Transformation is usually opaque to the user.
• For example, Microsoft’s web search engine (i.e. Bing) automatically
converts words like “VS.” to “versus”. The lack of user control for this
feature is mitigated by the fact that this transformation nearly always
matches the searcher’s intention.
• Although query transformation could be a useful feature, it could
sometimes frustrate the user.
• For examples, Google returns pages that contain people’s names for
which the middle initial is missing, even if the original query specifies
the middle initial. In this example, the transformed query (which the
users did not require anyway) will frustrate a user who is trying to
distinguish between two persons with similar names.
This text is usually shown in greyed-out signal that is intended to be replaced by the
user’s text. The text within the form disappears when the user clicks in the form.
Topic 4
(Wilson, 2012)
(Wilson, 2012)
Dr. Florence O. Entsua-Mensah (Mrs) Slide 34
The D-E-C-I-D-E Process (1)
• Determine the goal of the evaluation. What do you want to prove or examine?
D
• Decide how to deal with any ethical issues. Ethical issues are particularly
D important when dealing with humans.
(Wilson, 2012)
Dr. Florence O. Entsua-Mensah (Mrs) Slide 35
The D-E-C-I-D-E Process (2)
(Wilson, 2012)
Dr. Florence O. Entsua-Mensah (Mrs) Slide 36
Summary
Dumais, S., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins,
D. C. (2003). Stuff I’ve Seen: A System for Personal Information
Retrieval and Re-Use. Retrieved from
https://www.microsoft.com/en-us/research/wp-
content/uploads/2003/01/siscore-sigir2003-final.pdf
College of Education
School of Continuing and Distance Education
2014/2015 – 2016/2017
Session Overview
• In recent years, the evaluation of Information Retrieval
Systems and techniques for indexing, sorting, searching
and retrieving information have become increasingly
important (Saracevic, as cited in Kowalski, 2007).
UNDERSTANDING EVALUATION
Efficiency Effectiveness
(Chowdhury, 2010)
Dr. (Mrs.) F. O. Entsua-Mensah 13
1. Designing the scope of evaluation
• This is where a detailed plan is set to form the basis
for the rest of the program.
• Set of Objectives that the given study will meet.
• Set the purpose and scope.
• How the evaluation will be conducted.
– Whether laboratory type setup or a real-life situation.
• What level will it be evaluated?
– Macro evaluation or Micro evaluation
• Probable constraints in terms of cost, staff time, etc.
College of Education
School of Continuing and Distance Education
2014/2015 – 2016/2017
1
Session Overview
• This session continues the discussion on the the
various ways in which an AIRS may be evaluated.
An ideal system will achieve 100% recall and 100% precision which is not
possible
(Lavrenko, 2013)
(Lavrenko, 2013)
(Koehrsen, 2018)
Dr. Florence O. Entsua-Mensah (Mrs) 13
Limitations of recall and precision
• Different users may want different levels of recall – a person
preparing a report on a given topic may prefer high recall.
Conversely, the one who need to know just something about a
given topic may prefer low recall.
• Recall assumes that, all relevant items have the same value,
but the value may be relative and varies from user to user,
and even from time to time with the same user.
– Both recall and precision relies on the relevance judgement of the user
and this judgement may be subjective.
– A subjective view of relevance may also be dependent upon the
knowledge of the contents of the user at the time of search.
• Therefore all pertinent items may be relevant but not all
relevant items may be pertinent.
(Chowdhury, 2010)
Dr. Florence O. Entsua-Mensah (Mrs) 15
Limitations of recall and precision – contd.