Professional Documents
Culture Documents
TECHNIQUES
Unit – I
INTRODUCTION
UNIT I INTRODUCTION
Information Retrieval – Early Developments – The IR Problem
– The User‘s Task – Information versus Data Retrieval - The IR
System – The Software Architecture of the IR System – The
Retrieval and Ranking Processes - The Web – The e-Publishing
Era – How the web changed Search – Practical Issues on the
Web – How People Search – Search Interfaces Today –
Visualization in Search Interfaces.
Information Retrieval
• Information retrieval is finding material of an
unstructured nature that satisfies an
information need from with in large
collections.
• Information retrieval (IR) is concerned with
representing, searching, and manipulating
large collections of electronic text and other
human-language data.
Information Retrieval
Web Search
• For example, the ranking algorithms are concerned with the counts of
word occurrences than whether the word is a noun or an adjective.
2. Evaluation
Precision & Recall
Precision
• Are the retrieved documents relevant?
• Precision is the proportion of retrieved documents that are
relevant
Recall
• Are all the relevant documents retrieved?
• Recall is the proportion of relevant documents that are retrieved.
3.Emphasis on users and their information needs
• Text queries are often poor descriptions of what the user actually
wants compared to the request to a database system, such as for
the balance of a bank account.
2. Storing data in
Repository.
3. Indexing ( easy
Retrieval and
Ranking)
4. Retrieval process
6. Expansion of Query
7. Ranking
1. spelling corrections
and elimination of
terms from the query.
2. Query Expansion
3. Eliminating stop
words, stemming
5. Retrieved
6. Ranking
The Web
• With the rapid growth of the Internet, more
information is available on the Web and Web
information retrieval presents additional technical
challenges when compared to classic information
retrieval due to the heterogeneity and size of the web.
• Web information retrieval is unique due to the
dynamism, variety of languages used, duplication, high
linkage, ill formed query and wide variance in the
nature of users.
• Many software tools are available for web information
retrieval such as Google, Yahoo, and many other agents.
E-PUBLISHING-ERA
• Since its inception, the Web became a huge
success - Well over 20 billion pages are now
available and accessible in the Web More than
one fourth of humanity now access the Web on
a regular basis.
• Publishing articles in web easily without any
time delay.
How the Web Changed the Search
IMPACTS
• Characteristics / Variety of the document. (Hyperlink and its connections)
• Text
• Images
• Graphs
• GIFs
• Videos