You are on page 1of 1

Physical World

Persons, Places Things, Ideas

describ

e

that exist in the

Data/Document Creation Process

Descriptions

Data/Document Storage System
es pr re
Data Objects

crib

e

des

Attributes

nt

rep res e

p

ke u

ma

as

such

Crawlers

as

su

... gather all of the text on a given web page, then continue gathering text on all of the pages linked to the given page, then follow the links on those pages, and so on.

text analysis link analysis meta data analysis

are “sensed” by Representations
Stored proxies of physical world objects and ideas

send data to Sensors
... are the input devices of the data/document storage system

collec

t data

for

send data to Indexers
Creates a lookup table from all of the data fed into it. If a graph is not in use all of the data fed into it is stored as entries in the lookup table. I-search and PLS are existing technologies.

ca

n

pa

ir

da

as

ch

ta

m

are

Algorithms
act

Organizing Principle

et

used

ad at

a

fro

Objects referenceable by a URI such as html, text, gif, jpeg, png, avi, mpeg, real, quicktime, wav, pdf, rss, xml, xul files

Groups of WWW Objects in context

thered

of

m

to

an

se nd sd at a

Metadata and Data such as XML, XUL, or RDF Data is self-describing and separate from the form it will eventually take.

Interpreter

d dir ectly

cou

dir ect ly

Turns the user’s query into something readable by the index. Could handle: boolean operators spell checking word stemming case folding internationalization thesauri phrase searching related terms

ld r ece ive

rece iv

e

rec e

ive

coul

cou

ld

ro m

an

Articulator sends data to a

uld

make

co

s use

re c

eiv

ed at

from data ceive uld re co

ad

ire

an

ctl yf

of

or ‘Layout Engine’ Combines form and content.

co

ul

d

ive ce re

da

ta

di

c re

tl

ro yf

m

an

sent to an

cou

ld

be

re

w dra

nb

yt

he

influ ence

infl uen

can infl

uence

ce

can

can

Input Device

could be the same as the

Output Device
ren
Any device that can receive data and render it for the user.

awn by an could be redr

Keywords
ca nb

can
es to

Views

de

rs

cont

ribu

contro

te to

l the d

the

ca
Scope is reduced by specifying additional restrictions upfront. Best for data retrieval. Active Searchers

pe ns

cify

red

spec i

in

fica tion of

in a

in a

an

conta

conta

n ntai

d co

could

could

coul

can be stored in Options

create

can contri

Customization Settings

made using a

cou

ld c

ont

spe

cify

ain

can

contains an

an

Could be by relevance, creation date, modification date, alphabetical, by source, by media type, involve progressive disclosure, pagination, or custom look and feel (themes)

isplay ord

er of

bute to th

c

be ould
could specify

such as keyverbs, boolean operators, data type, media type, language or domain

e specific

ation of

options that are stored either locally or on a server

make People
... can be described by their goals, age, gender, income, geographic location, education, hardware and software, connection speed, member status
coul d be

of a Queries
ecifica tion of

in order to get Source Answers

could

n be co

taine

d in

ta contains da

that can im

prove the

ion of specificat

contain

data th

at c

rove an imp

the sp

Behavioral History click paths, session tracking, decision tracking

can be stored in

Personalization Profiles

can

be

sto

rely
Scope is reduced by reducing choices from a found set, in a sequence of ever smaller sets. Best for document retrieval. Passive Searchers

rely

on
ed tor es
tain

is made up of

exist in the

involve

browser type and version, language setting, IP address, current page, monitor size, color settings, javascript capability

Environmental State

con

dat

Information to enable

Actions

Understanding Internet Search
User Context Concepts, Systems and Processes 8 August 1999
The suggested starting point for reading is “People”. I would like to acknowledge Hugh Dubberly for his many suggestions, and Ken Hickman and Paul Pangaro for their contributions. Designed by Matt Leacock Search Concept Map, version 1.2

Goals
Could be organized as cultural responses to human needs (Malinowski) roughly, Food Or more simply, Kinship Work, Play, Learn Shelter Protection and Activities Food, Clothing, Training Shelter, Love Hygiene

ttai

eir n th

Data/Document Retrieval Process

le a

n ca

b

eop pp

t ove mpr an i at c a th

uld

hel

peci he s

sho

in

fica

tion

of

mpo

wer

large collections of profiles that can be analyzed to produce trend information

Group Profiles

ld e shou

ed requir

on

red

into

the vide ld pro shou

behaviors and states that are stored either locally or on a server

col

lec

ted

and

su

ch

as

su

ch

suc

as

ha

s

are represented by

en

t

make up
are gath

organized into Feeds
ered by

provide structure to Tables Databases News, Products, Related Links, Trademarks, Recirculation, Stocks

or ga

coul

d be

cont

ni

Objects residing in databases that are not referenceable by a simple URI such as company names, addresses, phone numbers, names of people, personal profiles, horoscopes, credit card numbers, product names, prices, maps, event listings, news clips, weather Groups of WWW Pages in context describe
de scr

ze

d

by

g

ate ener

is
Metadata data that contains information about its format

d use

to

cre

ate

provide multiple views of data in a database

aine

d in

a

A repository of data that could be stored in RDF format. Can be updated independently of the index that references it. Entries in the graph can reference each other. For instance, a request for a stock quote of an entry could point to the symbol, the symbol could point to the company name and the current quote, the current quote could point to the quote history, and the quote history could point to a quote histogram.

Data/Document Retrieval System

are an

are ind exe db

3rd Parties

generate

y

ca
Alexa Bigfoot At Hand
are

na

ct

as

WWW Sites

make up

Submissions
are g

are organized into
la te

Hierarchical Taxonomies

provide structure to

WWW Site Directory Databases Yahoo! Open Directory Infoseek

could be contained in a

Graph

ibe

athe

red b

ro

y

rg a

ni

ze

ra gene

te

d

by

a ate cre
Editors

nd

m

t ain

ain

Usenet Yahoo! Open Directory Dewey Decimal Library of Congress

are

de scr ibe

ei ar

an

nd ex b ed y

WWW Users

can be

can assign quality ratings such as editor’s choice or cool site of the day
are

A collection of technologies that attempt to rank the relative value of a set of results. Google ranks results during the indexing process by examining the relationships created by the hyperlinks in the documents. More popular sites score higher. Clever does the ranking during post processing. It uses an iterative process that looks at the popularity of a site (much like Google) but does so in regard to the keywords used in the search.
can
c ren efe is r

Amazon ranks related books and CDs by keeping track of the purchasing habits of groups (group profiles). Grapevine allows users to actively rank results (a customization process) but then attempts to guess what the user will want based on what it has learned (a personalization process). Direct Hit analyzes the time spent by users on a given result. It also keeps track of whether or not a user returns to the results page after viewing a result. If the user returns, the result receives a lower score.

e ar

paid volunteer

coul

co d be

ntai

ned

in a

WWW Objects

make up

WWW Pages

make up
are ga

Page Databases
by

are organized into
la te

Sorted Lists alphabetically by relevance by modification date

provide structure to
are an

WWW Document Databases Inktomi Google Alta Vista Lycos

are

ro

ind

Relevance Ranking
ex ed by

be d

one

rg a

by

ni

ze

ra gene

te

d

by

u are

sed

to

ate cre

b ed

use

db

y

ya n

by

Any processes done on the data returned from the index. Could involve sorting, ordering, or compiling the data retrieved from the index.

create an Index

sends data to a
generate s
If a graph is in place, the index functions as a lightweight lookup table.

Analyzers

Post Processors
gene rate s

sends data to an

If no graph is used, the index contains all of the data in addition to the lookup table, in a list format.

Results Data

su

ch

as

Raw Data such as HTML streams Form and content are merged.

Scraper Scrapes unwanted form information from the data stream. Classifies remaining data.

Aggregator

Templates Provides the architecture, or “form” of the data for the creation of the results page. The templates can exist in multiple, localized formats.

could be sent to a sends data to an organize the form of

Combines or interleaves data from multiple sources.

displays
Results Pages Any UI page or widget that displays the results data

require toward