You are on page 1of 56

Information Retrieval Basics

Sergey Chernov
Center for the Study of New Media and Society
www.newmediacenter.ru

Information search in action

Vladimir Pekhtin

Alexey Navalny

Doct_z
4/11/2013 Sergey Chernov, Information Retrieval Basics

Public data

4/11/2013

Sergey Chernov, Information Retrieval Basics

Resources and achievements


Search engines
Databases for property owners in Europe & USA List of Deputies of State Duma

Man-hours invested in manual search and exploration

Results: 500+ news, 150 articles, 20

interviews and videos, Pekhtin resigned from Committee of Ethics


4/11/2013 Sergey Chernov, Information Retrieval Basics

Outline for today


Sources of Information
Search strategies and tools Search Cases Assignments and Q&A Session

4/11/2013

Sergey Chernov, Information Retrieval Basics

Outline for today


Sources of Information
Search strategies and tools Search Cases Assignments and Q&A Session

4/11/2013

Sergey Chernov, Information Retrieval Basics

Information in numbers
Facebook 900 mln users
Twitter 500 mln Flickr 50 mln Delicious 5 mln Web 1 trln

4/11/2013

Sergey Chernov, Information Retrieval Basics

Information Retrieval
Information Retrieval (IR) is

finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Information Domains
Social Networks Web CMS
DVD

Online Shops

DB

Disk FShare

E-mail

Desktop
People

Enterprise Web (Intranet)


Online Libraries Web Sites

Public Web (Internet)

Information Retrieval System


Crawler

Downloads/collects the data

Indexer

Processes the data and builds Inverted Index

Ranker

Evaluates user queries against the index and computes a list of (ranked) results Organizes and displays the results to the user, facilitates navigation through the result set

Display

Sec. 19.4.1

User Needs
Need [Broder 2002, Rose and Levinson 2004]
Informational want to learn about something Navigational want to go to that page

Low hemoglobin United Airlines

Transactional want to do something (web-mediated)


Access a service

Downloads
Shop

Seattle weather Mars surface images Canon S410

Gray areas

Car rental Brasil

Find a good hub Exploratory search see whats there


11

How far do people look for results?

(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)


12

California State University, Chico

http://www.csuchico.edu/lins/handouts/eval_websites.pdf

How to evaluate results? CRAAP


Currency
Relevance Authority
How old is the material? Does the age matter?

History better old info, medicine fresh stuff.


How well does it fit? Does it answer my question?

Detailed enough?
Who wrote it? Is the author is qualified to write?

What about contact information?


Is it supported by evidence? Refereed? Verifiable?

Accuracy
Purpose
4/11/2013

Unbiased? Clearly written?


What can you infer about authors message? Is it

fact, opinion or propaganda?


Sergey Chernov, Information Retrieval Basics

Where to search?

Web Subject directories Intranet and Desktop Digital libraries Social platforms Databases and Hidden Web Business analytics Wikipedia Photo stocks Open datasets and Linked Data Open Gov Data
Sergey Chernov, Information Retrieval Basics

4/11/2013

Web

4/11/2013

Sergey Chernov, Information Retrieval Basics

http://webupon.com/searchengines/top-five-subject-directories-andhow-to-use-them/

Subject directories

4/11/2013

Sergey Chernov, Information Retrieval Basics

Intranet

4/11/2013

Sergey Chernov, Information Retrieval Basics

Desktop

4/11/2013

Sergey Chernov, Information Retrieval Basics

Digital libraries

4/11/2013

Sergey Chernov, Information Retrieval Basics

Social platforms

4/11/2013

Sergey Chernov, Information Retrieval Basics

Databases and Hidden Web

4/11/2013

Sergey Chernov, Information Retrieval Basics

Business Analytics

4/11/2013

Sergey Chernov, Information Retrieval Basics

Wikipedia

4/11/2013

Sergey Chernov, Information Retrieval Basics

Photo stocks

4/11/2013

Sergey Chernov, Information Retrieval Basics

Linked Data

4/11/2013

Sergey Chernov, Information Retrieval Basics

Open Data

4/11/2013

Sergey Chernov, Information Retrieval Basics

Outline for today


Sources of Information
Search strategies and tools Search Cases Assignments and Q&A Session

4/11/2013

Sergey Chernov, Information Retrieval Basics

Search is a journey

Is that all?

http://www.flickr.com/photos/morville

Search is a journey

http://www.flickr.com/photos/morville

Search is a journey

http://www.flickr.com/photos/morville

Search is a journey

http://www.flickr.com/photos/morville

Search is a journey

http://www.flickr.com/photos/morville

Exploratory search
Exploratory search

Lookup

Learn

Investigate

Question answering Fact retrieval Known-item search Navigational search Lasts for seconds

Knowledge acquisition Comprehension Comparison Discovery Serendipity

Incremental search Driven by uncertainty Non-linear behavior Result analysis Lasts for hours

Exploratory behavior
Learn About the search topic About the collection
Reformulate query Broadening Narrowing Changing the focus Socialize Looking for experts Collaborative search

Search tools
Web search engines Personalized search Faceted search

Review services
Geo-services Question answering Scientific search Domain-specific search Recommender systems
4/11/2013 Sergey Chernov, Information Retrieval Basics

Web search engine


Query suggestions

Snippets

4/11/2013

Sergey Chernov, Information Retrieval Basics

Web search engine (2)

4/11/2013

Sergey Chernov, Information Retrieval Basics

Web search engine (3)


Search for pages that link to a URL link: operator

link: google.com/images
Search for pages that similar to a URL related:

related: nytimes.com
Search for results from specific sites site:

site: strelkainstitute.com
4/11/2013 Sergey Chernov, Information Retrieval Basics

Personalized search
Personalization is a modeling of users

preferences from previous interactions


Queries, click-through analysis, eye tracking

Personalized Search usually implemented as:


Re-ranking and filtering of the search results

Personalized query expansion

4/11/2013

Sergey Chernov, Information Retrieval Basics

4/11/2013

Sergey Chernov, Information Retrieval Basics

Faceted search
facet

Its about Result Analysis!

facet values

Faceted search (2)

Its about Query Reformulation!

Review services

4/11/2013

Sergey Chernov, Information Retrieval Basics

Geo-services

4/11/2013

Sergey Chernov, Information Retrieval Basics

Question answering

4/11/2013

Sergey Chernov, Information Retrieval Basics

Scientific search

4/11/2013

Sergey Chernov, Information Retrieval Basics

Scientific Search (2)

4/11/2013

Sergey Chernov, Information Retrieval Basics

Domain-specific search

4/11/2013

Sergey Chernov, Information Retrieval Basics

Recommender systems

4/11/2013

Sergey Chernov, Information Retrieval Basics

Outline for today


Sources of Information
Search strategies and tools Search Cases Assignments and Q&A Session

4/11/2013

Sergey Chernov, Information Retrieval Basics

Case 1: finding a research paper

4/11/2013

Sergey Chernov, Information Retrieval Basics

Case 2: planning a trip

4/11/2013

Sergey Chernov, Information Retrieval Basics

Case 3: looking for an expert

4/11/2013

Sergey Chernov, Information Retrieval Basics

Case 4: market analysis

4/11/2013

Sergey Chernov, Information Retrieval Basics

Outline for today


Sources of Information
Search strategies and tools Search Cases Assignments and Q&A Session

4/11/2013

Sergey Chernov, Information Retrieval Basics

Practical assignment
Construct 3 information needs, relevant to your

everyday experience (preparing for an interview, choosing a learning course, doing a homework, etc.) Search for the information, using maximum number of sources and tools Share your experience

4/11/2013

Sergey Chernov, Information Retrieval Basics