Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword or section
Like this
1Activity

Table Of Contents

Preliminaries
Example of a query
Outline
Classic IR
Determining query results
IR on the Web
Typical home page: non-running text
Typical home page: Non-running text
The big challenge
What’s different about the Web?
The bigger challenge
Why don’t the users get what they want?
Google output: mouse trap
Google output: trap mice
Quantifying the quality of results
Classic evaluation of IR systems
Evaluation in the Web context
Web IR tools
Web IR tools (cont...)
General purpose search engines
Some ranking criteria
Connectivity analysis
Graph representation for the Web
Query-independent ranking: Motivation for PageRank
Definition of PageRank [BP’98]
PageRank (cont.)
Output from Google: princess diana
Query-dependent ranking: the neighborhood graph
HITS [K’98]
Intuition
HITS details
Output from HITS: jobs
Output from HITS: +jaguar +car
Problems & solutions
Modified HITS algorithms
Output from modified HITS: +jaguar +car
User study [BH’98]
PageRankvs. HITS
Open problems
More on graph representation
Graph representation usage
Example: SRC Connectivity Server [BBHKV’98]
URL database
Web graph factoid
Index building
Reasons for duplicates
Uses of duplicate information
2 Types of duplicate filtering
Fine-grain: Basic mechanism
The basics of a solution
Shingling
Defining resemblance
Sampling minima
Implementation
If we need only high resemblance
Real implementation
Probability that two documents are deemed fungible
Features vs. full sketch
Fine-grain duplicate elimination: open problems and related work
Input: set of URLs
Example
Coarse-grain: Basic mechanism
A definition of mirroring
Different pre-filtering techniques
Problem with IP addresses
Number of host with same IP address vsmirror probability
IP based pre-filtering algorithms
URL string based pre-filtering algorithms
URL string based algorithms (cont.)
Paths + connectivity (conn)
Hostname connectivity
Experiments
Precision up to rank 25,000
Combined approach (combined)
Precisionvsrelative recall
Web host graph
Example of a component
Component size distribution
Coarse-grain duplicate filtering: Summary and open problems
Adding pages to the index
Queuing discipline
Load balancing
Hierarchical directories
Dealing with heterogeneous sources
Example: a shopping robot
Jango input example
Jango output
Direct query to K&L Wine
What price decadence?
Web IR Tools
Search-by-example
Output from Google: related:www.ebay.com
Output from Alexa: www.ebay.com
Connectivity based solutions
Algorithm Companion
Building neighborhood graph N
Refinement 1: Limit out-degree
Co-citation algorithm
User study
Collaborative filtering
Lots of projects
Why do we care?
Comparison of search engines
Comparing Search Engine Sizes
URL sampling
Sampling via queries [BB’98]
Estimate relative sizes
Selecting a random page
Checking if an engine has a page
Results of the BB’98 study
Crawling strategies are different!
Quality: A general definition [HHMN’99]
Estimating quality by sampling
Missing pieces
Sampling pages (almost) according to PageRank
Random walk effectiveness
Most frequently visited pages
Most frequently visited hosts
Results for index quality
Results for index quality/page
Insights from the data
How often do people view a page?
Query log statistics [SHMM’98]
Lots of things we didn’t even touch
Final conclusions
Acknowledgements
0 of .
Results for:
No results containing your search query
P. 1
icde

icde

Ratings: (0)|Views: 15 |Likes:
Published by jabadia83

More info:

Published by: jabadia83 on Nov 20, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/07/2011

pdf

text

original

You're Reading a Free Preview
Pages 4 to 24 are not shown in this preview.
You're Reading a Free Preview
Pages 28 to 76 are not shown in this preview.
You're Reading a Free Preview
Pages 80 to 88 are not shown in this preview.
You're Reading a Free Preview
Pages 92 to 119 are not shown in this preview.
You're Reading a Free Preview
Pages 123 to 154 are not shown in this preview.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->