You are on page 1of 6

Who I Am

Search Analytics for Fun and Profit Information architecture consultant to Fortune 500s
Publisher and founder, Rosenfeld
An Event Apart Media
Chicago, Illinois Blog at
August 27, 2007 Co-author, Information Architecture
for the World Wide Web (3rd ed.,
Lou Rosenfeld 2006; O’Reilly) New book: Search Analytics for Your Site:
Conversations with your customers (2008; Rosenfeld

Anatomy of a Search Log The Zipf Curve:

(from Google Search Appliance) Short Head, Middle Torso, Long Tail
Critical elements in pink: IP address, time/date stamp, query, and # of results:

XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET

HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET
UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200
8283 146 0.16
XXX.XXX.XX.130 - - [10/Jul/2006:10:24:38 -0800] "GET
commission&ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 62 0.17

Keep It In Proportion What’s the Sweet Spot?

7218 campus map Rank Cumul. % Count Query

5859 map 1 1.40 7218 campus map

14 10.53 2464 housing
5184 im west
4320 library 42 20.18 1351 webenroll
3745 study abroad 98 30.01 650 computer center
3690 schedule of courses 221 40.05 295 msu union
3584 bookstore
3575 spartantrak 500 50.02 124 hotels
3229 angel 7877 80.00 7 department of surgery
3204 cata

Where will you
Topical Patterns and Seasonal Changes
Capture Search Queries?
1. The search logs that your search engine
naturally captures and maintains as searches
take place
2. Search keywords or phrases that your users
execute, that you capture into your own local
3. Search keywords or phrases that your
commercial search solution captures,
records, and reports on (Mondosoft, Visual
Sciences, Ultraseek, Google Appliance, etc.)

Querying your Queries: Tune your Questions:

Getting started From generic to specific
1. What are the most frequent unique queries? Netflix asks
2. Are frequent queries retrieving quality results? 1. Which movies most frequently searched?
3. Click-through rates per frequent query? 2. Which of them most frequently clicked through?
4. Most frequently clicked result per query? 3. Which of them least frequently added to queue?

5. Which frequent queries retrieve zero results?

6. What are the referrer pages for frequent queries?
7. Which queries retrieve popular documents?
8. What interesting patterns emerge in general?

Diagnose This: User Research:

Fixing and improving the UX What do they want?…
1. User Research SA is a true expression of users’ information
needs (often surprising: e.g., SKU #s at
2. Content Development clothing retailer; URLs at IBM)
3. Interface Design: search entry interface, Provides context by displaying aspects of
search results single search sessions
4. Retrieval Algorithm Modification
5. Navigation Design
6. Metadata Development

User Research: User Research:
…what else do they want?… …who wants it?…

BBC provides
Specific segments needs as determined by:
reports to determine  Security clearance
other terms  IP address
searched within
 Job function
same session
(tracked by cookies)  Account information
 Alternatively, you may be able to extrapolate
segments directly from SA
Pages they initiate searches from

User Research: User Research:

…who wants it?… …and when do they want it?
Time-based variation (and clustered queries) from
BBC’s top MSU
queries report  By hour, by day,
from children’s by season
section of site  Helps determine
“best bets”
 Also can help
tune main page
and other
editorial content

Content Development: Content Development:

Do we have the right content? Are we featuring the right stuff?
Analyze 0 result queries Track clickthroughs to
 Does the content exist? determine which results
 If so, there are titling, should rise to the top
wording, metadata, or (example: SLI Systems)
indexing problems
 If not, why not? Also suggests which “best
bets” to develop to
address common queries

BBC removes navigation

pages from search results

Search Entry Interface Design: Search Results Interface Design:
“The Box” or something else? Which results where?
Identify “dead end” points (e.g., 0 hits, 2000 hits) #10 result is clicked through more often than #s 6, 7, 8, and
where assistance could be added 9 (ten results per page)
Query syntax helps you select search features to
expose (e.g., use of Boolean operators)


From SLI Systems (

Search Results Interface Design: Search System:

How to sort results? What to change?
Financial Times has found that users often include Add functionality: Financial Times added spell
dates in their queries checking
Obvious but effective improvement: allow users to Retrieval algorithm modifications
sort by date
 Financial Times weights company names higher
 Netflix determines better weighting for unique terms
and phrases
Deloitte, Barnes & Noble, Vanguard demonstrate
that basic improvements (e.g., Best Bets) are
insufficient (and justify increased $$$)

Navigation: Navigation:
Any improvements? Where does it fail?
Michigan State University builds A-Z index Track and study pages (excluding main
automatically based on frequent queries page) where search is initiated
 What do they search? (e.g., acronyms, jargon)
 Are there other issues that would cause a
“dead end”? (e.g., tagging and titling
 Are there user studies that could test/validate
problems on these pages? (e.g., “Where did
you want to go next?)

Metadata Development: Metadata Development:
How do searchers express their needs? Which values and attributes?
Tone and jargon (e.g., “cancer” vs. Uncover hierarchy and
“oncology,” “lorry” vs. “truck,” acronyms) identify
 Metadata values (e.g., mobile vs.
Syntax (e.g., Boolean, natural language, cell)
keyword)  Metadata attributes (e.g., genre,
Length (e.g., number of terms/query; Long  Content types (e.g., spec, price
Tail queries longer and more complex sheet)
than Short Head) SA combines with AI tools
Everything we know from analyzing for clustering, enabling
folksonomic tags applies here, and vice versa concept searching and
thesaurus development

Metadata Development: Organizational Impact:

Leveraging differences in the curve Educational opportunities
Variations in information needs emerge between “Reverse engineer” performance problems
Short Head and Long Tail  Vanguard
Example: Deloitte intranet’s “known-item” queries  Tests “best” results for common queries
are common; research topics are infrequent  Determines why these results aren’t retrieved or
 Demonstrates problem and solutions to content
owners/authors benefits
known-item research  Sandia Labs does same, only with top results
queries queries
that are losing rank in search results pages

Organizational Impact: SA as User Research Method:

Reexamining assumptions Sleeper, but no panacea
Financial Times learns about breaking stories Benefits
from their logs by monitoring spikes in  Non-intrusive
 Inexpensive and (usually) accessible
company names and individuals’ names  Large volume of “real” data
and comparing with their current  Represents actual usage patterns
coverage Drawbacks
Discrepancy = possible breaking story;  Provides an incomplete picture of usage: was user
satisfied at session’s end?
reporter is assigned to follow up  Difficult to analyze: where are the commercial tools?
Next step? Assign reporters to “beats” that Complements qualitative methods (e.g., persona
emerge from SA development, task analysis, field studies)

SA Headaches: Please Share Your SA Knowledge:
What gets in the way? Visit our book in progress site
Problems* Search Analytics for Your Site:
Conversations with your
 Lack of time Customers by Louis
 Few useful tools for parsing logs, generating reports Rosenfeld and Richard
 Tension between those who want to perform SA and Wiggins (Rosenfeld
those who “own” the data (chiefly IT) Media, 2008)
 Ignorance of the method
 Hard work and/or boredom of doing analysis
Most of these are going away…
* From summer 2006 survey (134 responses), available at book site.
Site URL:
Feed URL:

Contact Information

Louis Rosenfeld
Rosenfeld Media, LLC
705 Carroll Street, #2L
Brooklyn, NY 11215 USA