You are on page 1of 158

Google’s PageRank and Beyond

:
The Science of Search Engine Rankings

Amy Langville
langvillea@cofc.edu

Department of Mathematics College of Charleston Charleston, SC Citadel 10/16/07

Outline
• Introduction to Information Retrieval • Elements of a Search Engine • Link Analysis • Current Issues in Web Search

monks in scriptoriums Gutenberg’s printing press Franklin’s public libraries Dewey’s decimal system Card catalog 1940s-1950s 1960s 1989 Computer Salton’s SMART system Berner-Lee’s WWW . 12th cent. 1450 1700s 1872 cave paintings Beowulf invention of paper. need (query) B. for particular info. A.Short History of IR IR = search within doc.D. A. 7-8th cent.D. coll. C.

S ystem for the M echanical A nalysis and R etrieval of T ext Harvard 1962 – 1965 Cornell 1965 – 1970 Gerard Salton • Implemented on IBM 7094 & IBM 360 • Based on matrix methods .

landing gear) .g.Term–Document Matrices Start with dictionary of terms Words or phrases ( e..

Term–Document Matrices Start with dictionary of terms Words or phrases Index Each Document Humans scour pages and mark key terms ( e..g. landing gear) .

.Term–Document Matrices Start with dictionary of terms Words or phrases Index Each Document Humans scour pages and mark key terms Count fij = # times term i appears in document j ( e.g. landing gear) .

.. ... .g. ⎜ ⎝ . ... ... . . . Doc n f1n f2n .. landing gear) f12 f22 . . fmn ⎞ ⎟ ⎟ ⎟ = Am×n ⎠ . . . fm2 .Term–Document Matrices Start with dictionary of terms Words or phrases Index Each Document Humans scour pages and mark key terms Count fij = # times term i appears in document j Term–Document Matrix ⎛ Doc 1 Term 1 f11 Term 2⎜ f21 ⎜ .. Term m fm1 Doc 2 ( e. . . .

. q2. .Query Matching Query Vector qT = (q1. qm) qi = 1 if Term i is requested 0 if not . ..

. .Query Matching Query Vector qT = (q1. . qm) qi = 1 if Term i is requested 0 if not How Close is Query to Each Document? . q2..

q2. how close is q to each column Ai? A1 q A2 θ1 θ2 A3 .. .Query Matching Query Vector qT = (q1.. .e. . qm) qi = 1 if Term i is requested 0 if not How Close is Query to Each Document? i.

.e. . .. q2..Query Matching Query Vector qT = (q1. qm) qi = 1 if Term i is requested 0 if not How Close is Query to Each Document? i. how close is q to each column Ai? A1 q A2 θ1 θ2 qT Ai Use δi = cos θi = q Ai A3 .

how close is q to each column Ai? A1 q A2 θ1 θ2 qT Ai Use δi = cos θi = q Ai Rank documents by size of δi A3 Return Document i to user when δi ≥ tol ... .Query Matching Query Vector qT = (q1.e. . qm) qi = 1 if Term i is requested 0 if not How Close is Query to Each Document? i. . q2.

• LATENT SEMANTIC INDEXING . Patent No. April 5. U.853. Patent No. 4. 1994.109. 5. June 13. U. 1989.839. Computerized cross-language document retrieval using latent semantic indexing.S.S.Susan Dumais’s Improvement Approximate A with a lower rank matrix Effect is to compress data in A • 2 patents for Bell/Telcordia — — Computer information retrieval using latent semantic structure.301.

Traditional IR Pros • Finds hidden connections .

Traditional IR Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications .

Traditional IR
Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications • Performs well on document collections that are Small + Homogeneous + Static

Traditional IR
Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications • Performs well on document collections that are Small + Homogeneous + Static Cons • Rankings are query dependent — Rank of each doc is recomputed for each query

Traditional IR
Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications • Performs well on document collections that are Small + Homogeneous + Static Cons • Rankings are query dependent — Rank of each doc is recomputed for each query • Only semantic content used — Can be spammed + Link structure ignored

Traditional IR Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications • Performs well on document collections that are Small + Homogeneous + Static Cons • Rankings are query dependent — Rank of each doc is recomputed for each query • Only semantic content used — Can be spammed + Link structure ignored • Difficult to add & delete documents .

Traditional IR Pros • Finds hidden connections • Can be adapted to identify document clusters — Text mining applications • Performs well on document collections that are Small + Homogeneous + Static Cons • Rankings are query dependent — Rank of each doc is recomputed for each query • Only semantic content used — Can be spammed + Link structure ignored • Difficult to add & delete documents • Finding optimal compression requires empirical tuning .

. • hezbollah: 9. 942. 12980. 809. . . 567. . 339.Trad. . IR applied to Web the pre-1998 Web Index . • border patrol: 4. 15158. • global warming: 178. 1103. . 445532. . . . 12.

146 adjacency list. 55. 38. 76. 116 iterative aggregation updating. approximate. 82. 201 bounce back. 105–107 iterative. 142 anchor text. 84–86 BadRank. 132. 7 bibliometrics. 25. 37. Albert. 195 censored Markov chain. 194 censorship. 73. 44 clustering search results. 182 censored chain. 104. 92 dangling node PageRank. 201 authority vector. 65. 30 compressed matrix storage. 29. 47–48 Amazon’s traffic rank. 179 a vector. 142 absolute error. 80 A9. 132 authority matrix. 201 Boldi. 165 asymptotic rate of convergence. 201 authority Markov chain. 77 adjacency matrix. 75 back button. 91 Alexa traffic ranking. 194 censored distribution. Vannevar. 124 α parameter. 110 aperiodic. 176 Application Programming Interface (API). 83 HITS. 108 personalized PageRank power method. 5–6. 102–104 arc. 201 co-reference. 205 Browne. 104 censored chains. Sergey. 172 complex networks. 37. 27 authority. Steve. 127. Paolo. Albert-Laszlo. 47. 169 advertising. 201 Babbage. 185 absorbing states. 138 algebraic multiplicity. 71. 142–143 co-citation. 10 Campbell. 162 Ces` aro summability. 30 Berry. 3. 168. 125 Atlas of Cyberspace. 197 aggregation. 84–86 bowtie structure. 38. 102–104 exact. 89–90 Adar. 195 aggregated transition matrix. 40 Aitken extrapolation. 49 quadratic extrapolation. 23 canonical form. stochastic matrix. 201 Ando.PuPstyleBook March 3. 136 asymptotic convergence rate. 131 BlockRank. 116. 48. 105 Aitken extrapolation. 102 cloaking. Kenneth. 92 Chien. 201 Collatz–Wielandt formula. 182 characteristic polynomial. 136 connected components. 141 Barabasi. 104 absorbing Markov chains. 105 aggregated transition probability. 54. 133 aperiodic Markov chain. Murray. 32. 146–147 Ces` aro sequence. reducible matrix. 94–97 approximate. 115. 36. 120. 41. 79–80 adaptive PageRank method. 201 Arrow. 156 Chebyshev extrapolation. 45 aggregated chain. 92 Brin. 59. 157 algorithm PageRank. 2006 Index k-step transition matrix. 93 query-independent HITS. 123. 197 aggregated chains. 123 bipartite undirected graph. 201 authority score. Michael. 7 Bush. 101. 75. 144–146. Eytan. Charles. 97 approximate aggregation. 133 . 185 accuracy. 197 aggregation theorem. 33. 134 Brezinski. 117. 94–97. Claude. 107–109 partition. 155 Condorcet. 79 Boolean model. 41. 104–105 exact vs. 123. 102 blog. 76 condition number. 109–112 aggregation in Markov chains. 119. Lord John.

200. 1103. (15. 12980. 942. • border patrol: 4.000 in total) . . .100. . 339.Trad. . 12. . .000 in total) . . (33. . • global warming: 178. 809.000 in total) . . 567.700. . . 445532. IR applied to Web the pre-1998 Web Index . (8. . . . . • hezbollah: 9. 15158.

700.Trad. (33.200. 12980. . . . . . . • border patrol: 4. 1103. . 339. IR applied to Web the pre-1998 Web Index . (8. 942. . • global warming: 178. 12. .000 in total) . .000 in total) . 15158.000 in total) too many results per search term easily spammed . 445532. 809. .100. . • hezbollah: 9. (15. 567. . . .

.. All men felt themselves to be the masters of an intact and secret treasure. ..Sentiments about the pre-1998 Web Yahoo • hierarchies of sites • organized by humans Best Search Techniques • word of mouth • expert advice Overall Feeling of Users • Jorge Luis Borges’ 1941 short story. the first impression was one of extravagant happiness. seemed almost intolerable. As was natural. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible. There was no personal or world problem whose eloquent solution did not exist in some hexagon. The Library of Babel When it was proclaimed that the Library contained all books. this inordinate hope was followed by an excessive depression.

1998: enter Link Analysis • uses hyperlink structure to focus the relevant set • combine traditional IR score with popularity score Page and Brin 1998 Kleinberg .

Thousands of sources from around the world ensure anyone with an Internet connection can stay informed.1998 . or research the background of a particular corporation.. It’s the Swiss Army knife of information retrieval. chair. The Simpsons • “I can’t imagine life without Google News.Matt Groening. The diversity of viewpoints available is staggering.. I use it.Garry Trudeau. translate a phrase. creator and executive producer. I use it to ego-surf. I may use it to check the spelling of a foreign name.” . Anytime I want to find out anything. but it might as well be.” .Michael Powell. On the run-up to a deadline.” . to acquire an image of a particular piece of military hardware. to find the exact quote of a public figure. check a stat. enter Link Analysis Change in User Attitudes about Web Search Today • “It’s not my homepage. Doonesbury . cartoonist and creator. I use it to read the news. Federal Communications Commission • “Google is my rapid-response research assistant.

Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR .

Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? .

– over 10 billion pages. average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web .400 X bigger than Surface Web .Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? • It’s huge.

com change daily . – over 10 billion pages.400 X bigger than Surface Web • It’s dynamic.Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? • It’s huge. average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web . – size changes: billions of pages added each year 23% of . – content changes: 40% of pages change in a week.

Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? • It’s huge. – no standards. and spammers! 23% of .com change daily . – over 10 billion pages. link rot. formats – errors. review process. average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web .400 X bigger than Surface Web • It’s dynamic. – content changes: 40% of pages change in a week. falsehoods. – size changes: billions of pages added each year • It’s self-organized.

review process. falsehoods. average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web .com change daily . link rot. – no standards. – size changes: billions of pages added each year • It’s self-organized. and spammers! A Herculean Task! 23% of .400 X bigger than Surface Web • It’s dynamic. – content changes: 40% of pages change in a week. – over 10 billion pages. formats – errors.Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? • It’s huge.

but it’s hyperlinked ! – Vannevar Bush’s 1945 memex 23% of . – over 10 billion pages.com change daily .Web Information Retrieval IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections? • It’s huge.400 X bigger than Surface Web • It’s dynamic. – size changes: billions of pages added each year • It’s self-organized. – no standards. formats – errors. each about 500KB – 20 times size of Library of Congress print collection – Deep Web . review process. and spammers! • Ah. – content changes: 40% of pages change in a week. falsehoods. link rot.

Elements of a Web Search Engine WWW Crawler Module Page Repository User Quer query-in depende n t Res Indexing Module ults Query Module Ranking Module ies Indexes Special-purpose indexes Content Index Structure Index .

Elements of a Web Search Engine WWW Crawler Module Page Repository User Quer pendent Res Indexing Module ults Query Module Ranking Module ies query-inde Indexes Special-purpose indexes Content Index Structure Index .

The Ranking Module (generates popularity scores) • Measure the importance of each page .

The Ranking Module (generates popularity scores) • Measure the importance of each page • The measure should be Independent of any query — Primarily determined by the link structure of the Web — Tempered by some content considerations .

Elements of a Web Search Engine WWW Crawler Module Page Repository User Quer ependent Res ults Indexing Module Query Module Ranking Module ies query-ind Indexes Special-purpose indexes Content Index Structure Index .

The Ranking Module (generates popularity scores) • Measure the importance of each page • The measure should be Independent of any query — Primarily determined by the link structure of the Web — Tempered by some content considerations • Compute these measures off-line long before any queries are processed .

Elements of a Web Search Engine WWW Crawler Module Page Repository User Quer nt Res ults Indexing Module depende Query Module Ranking Module ies query-in Indexes Special-purpose indexes Content Index Structure Index .

The Ranking Module (generates popularity scores) • Measure the importance of each page • The measure should be Independent of any query — Primarily determined by the link structure of the Web — Tempered by some content considerations • Compute these measures off-line long before any queries are processed • Google’s PageRank petitors c technology distinguishes it from all com- .

The Ranking Module

(generates popularity scores)

• Measure the importance of each page • The measure should be Independent of any query — Primarily determined by the link structure of the Web — Tempered by some content considerations • Compute these measures off-line long before any queries are processed • Google’s PageRank petitors
c

technology distinguishes it from all com-

Google’s PageRank = Google’s $$$$$

.

.

.

.

How To Measure “Importance” Landmark Result Paper Survey Paper—Big Bib .

How To Measure “Importance” Landmark Result Paper Survey Paper—Big Bib Authorities Hubs .

How To Measure “Importance” Landmark Result Paper Survey Paper—Big Bib Authorities • Good hubs point to good authorities Hubs • Good authorities are pointed to by good hubs .

HITS Hypertext Induced Topic Search (1998) Determine Authority & Hub Scores • ai = authority score for Pi • hi = hub score for Pi Jon Kleinberg .

⎣.HITS Hypertext Induced Topic Search (1998) Determine Authority & Hub Scores • ai = authority score for Pi • hi = hub score for Pi Successive Refinement • Start with hi = 1 for all pages Pi ⇒ ⎡ ⎤ 1 ⎢1⎥ ⎥ . h0 = ⎢ .⎦ 1 Jon Kleinberg .

⎦ • Define Authority Scores (on the first pass) 1 • hi = hub score for Pi Jon Kleinberg ai = j :Pj →Pi hj . ⎣. • Start with hi = 1 for all pages Pi ⇒ h0 = ⎢ .HITS Hypertext Induced Topic Search (1998) Determine Authority & Hub Scores • ai = authority score for Pi ⎡ ⎤ 1 Successive Refinement ⎢1⎥ ⎥ .

j :Pj →Pi 1 Pi → Pj an Lij = 0 Pi → Pj .HITS Hypertext Induced Topic Search (1998) Determine Authority & Hub Scores • ai = authority score for Pi • hi = hub score for Pi Jon Kleinberg ⎡ ⎤ 1 Successive Refinement ⎢1⎥ ⎥ .⎦ • Define Authority Scores (on the first pass) 1 ⎡ ⎤ a1 ⎢ a2 ⎥ ⎥ = L T h0 . h j ⇒ a1 = ⎢ ai = ⎦ ⎣ . • Start with hi = 1 for all pages Pi ⇒ h0 = ⎢ . ⎣. .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 • h2 = La2 . .. .

. Combined Iterations • A = LT L (authority matrix) .HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 • h2 = La2 ..

.. ak = Aak−1 → e-vector (direction) Combined Iterations • A = LT L (authority matrix) .HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 • h2 = La2 .

HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 • h2 = La2 . ak = Aak−1 → e-vector hk = Hhk−1 → e-vector (direction) (direction) Combined Iterations • A = LT L (authority matrix) • H = LLT (hub matrix) . ..

. ak = Aak−1 → e-vector hk = Hhk−1 → e-vector (direction) (direction) Combined Iterations • A = LT L (authority matrix) • H = LLT (hub matrix) !! May not be uniquely defined if A or H is reducible !! .HITS Algorithm Refine Hub Scores • hi = j :Pi→Pj aj ⇒ h1 = La1 Lij = 1 Pi → Pj 0 Pi → Pj Successively Re-refine Authority & Hub Scores • a1 = LT h0 • h1 = La1 • a2 = LT h1 • h2 = La2 . .

Do direct query matching .Compromise 1.

Do direct query matching 2. Build neighborhood graph .Compromise 1.

Do direct query matching 2.Compromise 1. Build neighborhood graph 3. Compute authority & hub scores for just the neighborhood .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores • Some flexibility for making refinements .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores • Some flexibility for making refinements Disadvantages • Too much has to happen while client is waiting .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores • Some flexibility for making refinements Disadvantages • Too much has to happen while client is waiting — Custom built neighborhood graph needed for each query .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores • Some flexibility for making refinements Disadvantages • Too much has to happen while client is waiting — Custom built neighborhood graph needed for each query — Two eigenvector computations needed for each query .

Pros & Cons Advantages • Returns satisfactory results — Client gets both authority & hub scores • Some flexibility for making refinements Disadvantages • Too much has to happen while client is waiting — Custom built neighborhood graph needed for each query — Two eigenvector computations needed for each query • Scores can be manipulated by creating artificial hubs .

HITS Applied −→ −→ .

Ask.com in WSJ .

.

Google’s PageRank (Lawrence Page & Sergey Brin 1998) The Google Goals • Create a PageRank r(P ) that is not query dependent Off-line calculations — No query time computation • Let the Web vote with in-links But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not • Share The Vote Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y .

Google’s PageRank (Lawrence Page & Sergey Brin 1998) The Google Goals • Create a PageRank r(P ) that is not query dependent Off-line calculations — No query time computation • Let the Web vote with in-links But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not • Share The Vote Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y .

Google’s PageRank (Lawrence Page & Sergey Brin 1998) The Google Goals • Create a PageRank r(P ) that is not query dependent Off-line calculations — No query time computation • Let the Web vote with in-links But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not • Share The Vote Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y .

Google’s PageRank (Lawrence Page & Sergey Brin 1998) The Google Goals • Create a PageRank r(P ) that is not query dependent Off-line calculations — No query time computation • Let the Web vote with in-links But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not • Share The Vote Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y .

PageRank The Definition r (P ) = P ∈BP r (P ) |P | BP = {all pages pointing to P } |P | = number of out links from P .

P2.PageRank The Definition r (P ) = P ∈BP r (P ) |P | BP = {all pages pointing to P } |P | = number of out links from P Successive Refinement Start with r0(Pi) = 1/n for all pages P1. .. Pn . . .

PageRank The Definition r (P ) = P ∈BP r (P ) |P | BP = {all pages pointing to P } |P | = number of out links from P Successive Refinement Start with r0(Pi) = 1/n r0(P ) |P | for all pages P1. . Pn Iteratively refine rankings for each page r 1 (P i ) = P ∈BPi . . P2. ..

. . P2. Pn Iteratively refine rankings for each page r 1 (P i ) = P ∈BPi r2(Pi) = P ∈BPi .PageRank The Definition r (P ) = P ∈BP r (P ) |P | BP = {all pages pointing to P } |P | = number of out links from P Successive Refinement Start with r0(Pi) = 1/n r0(P ) |P | r1(P ) |P | for all pages P1. . .

Pn Iteratively refine rankings for each page r 1 (P i ) = P ∈BPi r2(Pi) = P ∈BPi rj +1(Pi) = rj (P ) |P | . . . .PageRank The Definition r (P ) = P ∈BP r (P ) |P | BP = {all pages pointing to P } |P | = number of out links from P Successive Refinement Start with r0(Pi) = 1/n r0(P ) |P | r1(P ) |P | . P2.. .. P ∈BPi for all pages P1.

rk (P2)..In Matrix Notation After Step k . . rk (Pn)] — πT k = [rk (P1). . .

In Matrix Notation After Step k . . rk (Pn)] — πT k = [rk (P1).. 1/|Pi| if i → j 0 otherwise — πT k +1 = πT kH where hij = . . rk (P2).

. rk (Pn)] — πT k = [rk (P1).In Matrix Notation After Step k . .. 1/|Pi| if i → j 0 otherwise — πT k +1 = πT kH where hij = — PageRank vector = πT = lim πT k = eigenvector for H k →∞ Provided that the limit exists . rk (P2).

Tiny Web 1 2 3 H = 6 5 P1 ⎜ P2 ⎜ ⎜ P3 ⎜ ⎜ P4 ⎜ ⎜ ⎜ P5 ⎜ ⎝ P6 ⎛ P1 P2 P3 P4 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ ⎜ P3 ⎜ ⎜ P4 ⎜ ⎜ ⎜ P5 ⎜ ⎝ P6 ⎛ P1 P2 1/2 P3 1/2 P4 0 P5 0 P6 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ P4 ⎜ ⎜ ⎜ P5 ⎜ ⎝ P6 ⎛ P1 P2 1/2 0 P3 1/2 0 P4 0 0 P5 0 0 P6 0 ⎟ 0 ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ ⎜ P5 ⎜ ⎝ P6 ⎛ P1 P2 1/2 0 1/3 P3 1/2 0 0 P4 0 0 0 P5 0 0 1/3 P6 0 0 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ ⎝ P6 ⎛ P1 P2 1/2 0 1/3 0 P3 1/2 0 0 0 P4 0 0 0 0 P5 0 0 1/3 1/2 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ P6 ⎛ P1 P2 1/2 0 1/3 0 0 P3 1/2 0 0 0 0 P4 0 0 0 0 1/2 P5 0 0 1/3 1/2 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ ⎞ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 ⎛ P1 P2 1/2 0 1/3 0 0 0 P3 1/2 0 0 0 0 0 P4 0 0 0 0 1/2 1 P5 0 0 1/3 1/2 0 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ 0 ⎞ 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 ⎛ P1 P2 1/2 0 1/3 0 0 0 P3 1/2 0 0 0 0 0 P4 0 0 0 0 1/2 1 P5 0 0 1/3 1/2 0 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ 0 ⎞ A random walk on the Web Graph 4 .

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 ⎛ P1 P2 1/2 0 1/3 0 0 0 P3 1/2 0 0 0 0 0 P4 0 0 0 0 1/2 1 P5 0 0 1/3 1/2 0 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ 0 ⎞ A random walk on the Web Graph 4 PageRank = πi = amount of time spent at Pi .

Hyperlink as vote 2 3 Markov chain 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Hyperlink as vote 2 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 2 3 6 5 4 .

Hyperlink as vote 2 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Hyperlink as vote 2 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 2 3 6 5 4 .

Hyperlink as vote 2 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Hyperlink as vote 2 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 2 3 6 5 4 .

Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page. Hyperlink as vote 2 3 6 5 4 .

Hyperlink as vote 2 page 2 is a dangling node 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 ⎛ P1 P2 1/2 0 1/3 0 0 0 P3 1/2 0 0 0 0 0 P4 0 0 0 0 1/2 1 P5 0 0 1/3 1/2 0 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ 0 ⎞ A random walk on the Web Graph 4 PageRank = πi = amount of time spent at Pi Dead end page (nothing to click on) — a “dangling node” .

1. 0) = e-vector =⇒ Page P2 is a “rank sink” .Tiny Web 1 2 3 H = 6 5 P1 0 ⎜ P2 ⎜ 0 ⎜ P3 ⎜ ⎜ 1/3 P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 ⎛ P1 P2 1/2 0 1/3 0 0 0 P3 1/2 0 0 0 0 0 P4 0 0 0 0 1/2 1 P5 0 0 1/3 1/2 0 0 P6 0 ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1/2 ⎟ ⎟ ⎟ 1/2 ⎟ ⎠ 0 ⎞ A random walk on the Web Graph 4 PageRank = πi = amount of time spent at Pi Dead end page (nothing to click on) — a “dangling node” πT = (0. 0. 0. 0.

The Fix Allow Web Surfers To Make Random Jumps .

Hyperlink as vote 2 surfer “teleports” 3 6 5 4 .Ranking with a Random Surfer • Rank each page corresponding to a search term by number and quality of votes cast for that page.

.. . . .The Fix Allow Web Surfers To Make Random Jumps T 1 1 1 e . — Replace zero rows with n = n n n ⎛ P1 P1 ⎜ 0 ⎜ P2 ⎜ 1/6 ⎜ P3 ⎜ ⎜ 1/3 ⎜ P4 ⎜ 0 ⎜ P5 ⎜ ⎜ 0 ⎝ 0 P6 P6 ⎞ 1/2 1/2 0 0 0 ⎟ ⎟ 1/6 1/6 1/6 1/6 1/6 ⎟ ⎟ 1/3 0 0 1/3 0 ⎟ ⎟ ⎟ 0 0 0 1/2 1/2 ⎟ ⎟ 0 0 1/2 0 1/2 ⎟ ⎟ ⎠ 0 0 1 0 0 P2 P3 P4 P5 S= .

. . .. — Replace zero rows with n = n n n ⎛ P1 P1 ⎜ 0 ⎜ P2 ⎜ 1/6 ⎜ P3 ⎜ ⎜ 1/3 ⎜ P4 ⎜ 0 ⎜ P5 ⎜ ⎜ 0 ⎝ 0 P6 P6 ⎞ 1/2 1/2 0 0 0 ⎟ ⎟ 1/6 1/6 1/6 1/6 1/6 ⎟ ⎟ 1/3 0 0 1/3 0 ⎟ ⎟ ⎟ 0 0 0 1/2 1/2 ⎟ ⎟ 0 0 1/2 0 1/2 ⎟ ⎟ ⎠ 0 0 1 0 0 P2 P3 P4 P5 S= T a e — S = H + 6 is now row stochastic =⇒ ρ(S) = 1 . .The Fix Allow Web Surfers To Make Random Jumps T 1 1 1 e .

πT = πT S with i πi =1 .t. . .. ..The Fix Allow Web Surfers To Make Random Jumps T 1 1 1 e . — Replace zero rows with n = n n n ⎛ P1 P1 ⎜ 0 ⎜ P2 ⎜ 1/6 ⎜ P3 ⎜ ⎜ 1/3 ⎜ P4 ⎜ 0 ⎜ P5 ⎜ ⎜ 0 ⎝ 0 P6 P6 ⎞ 1/2 1/2 0 0 0 ⎟ ⎟ 1/6 1/6 1/6 1/6 1/6 ⎟ ⎟ 1/3 0 0 1/3 0 ⎟ ⎟ ⎟ 0 0 0 1/2 1/2 ⎟ ⎟ 0 0 1/2 0 1/2 ⎟ ⎟ ⎠ 0 0 1 0 0 P2 P3 P4 P5 S= T a e — S = H + 6 is now row stochastic =⇒ ρ(S) = 1 — Perron says ∃ πT ≥ 0 s.

Nasty Problem The Web Is Not Strongly Connected .

Nasty Problem The Web Is Not Strongly Connected • •• S is reducible ⎛ P1 P1 0 ⎜ P2 ⎜ 1/6 ⎜ P3 ⎜ ⎜ 1/3 ⎜ P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 P2 P3 P6 ⎞ 0 0 0 ⎟ 1/6 1/6 1/6 ⎟ ⎟ 0 1/3 0 ⎟ ⎟ ⎟ 0 1/2 1/2 ⎟ ⎟ ⎟ 1/2 0 1/2 ⎟ ⎠ 1 0 0 P4 P5 1/2 1/2 1/6 1/6 1/3 0 0 0 0 0 0 0 S= .

t.Nasty Problem The Web Is Not Strongly Connected • •• S is reducible ⎛ P1 P1 0 ⎜ P2 ⎜ 1/6 ⎜ P3 ⎜ ⎜ 1/3 ⎜ P4 ⎜ ⎜ 0 ⎜ P5 ⎜ 0 ⎝ 0 P6 P2 P3 P6 ⎞ 0 0 0 ⎟ 1/6 1/6 1/6 ⎟ ⎟ 0 1/3 0 ⎟ ⎟ ⎟ 0 1/2 1/2 ⎟ ⎟ ⎟ 1/2 0 1/2 ⎟ ⎠ 1 0 0 P4 P5 1/2 1/2 1/6 1/6 1/3 0 0 0 0 0 0 0 S= — Reducible =⇒ PageRank vector is not well defined — Frobenius says S needs to be irreducible to ensure a unique πT > 0 s. πT = πT S with i πi = 1 .

Irreducibility Is Not Enough Could Get Trapped Into A Cycle (Pi → Pj → Pi) .

Irreducibility Is Not Enough Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge .

Irreducibility Is Not Enough
Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge
T = π — πT k +1 k S fails to convergence

Irreducibility Is Not Enough
Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge
T = π — πT k +1 k S fails to convergence

Convergence Requirement — Perron–Frobenius requires S to be primitive

Irreducibility Is Not Enough
Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge
T = π — πT k +1 k S fails to convergence

Convergence Requirement — Perron–Frobenius requires S to be primitive — No eigenvalues other than λ = 1 on unit circle

Irreducibility Is Not Enough Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge T = π — πT k +1 k S fails to convergence Convergence Requirement — Perron–Frobenius requires S to be primitive — No eigenvalues other than λ = 1 on unit circle — Frobenius proved S is primitive ⇐⇒ Sk > 0 for some k .

The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0. E = eeT /n. 0<α<1 .

.The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0. — G = αH + uvT > 0 E = eeT /n. 0<α<1 vT = eT /n u = αa + (1 − α)e.

0<α<1 vT = eT /n u = αa + (1 − α)e. — G = αH + uvT > 0 — PageRank vector E = eeT /n. πT = left-hand Perron vector of G .The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0.

” it gets lots of votes from other important pages.Ranking with a Random Surfer • If a page is “important. . the surfer spends on each page to create ranking of webpages. which means the random surfer visits it often. • Simply count the number of times. or proportion of time.

” it gets lots of votes from other important pages.29 3 2 Ranked List of Pages Page Page Page Page Page Page 4 6 5 2 1 3 6 5 4 .38 .Ranking with a Random Surfer • If a page is “important. • Simply count the number of times. Proportion of Time Page Page Page Page Page Page 1 2 3 4 5 6 = = = = = = .20 .04 . the surfer spends on each page to create ranking of webpages.04 .05 . which means the random surfer visits it often. or proportion of time.

πT = left-hand Perron vector of G . — G = αH + uvT > 0 — PageRank vector Some Happy Accidents — xT G = αxT H + β vT Sparse computations with the original link structure E = eeT /n. 0<α<1 vT = eT /n u = αa + (1 − α)e.The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0.

0<α<1 vT = eT /n u = αa + (1 − α)e. πT = left-hand Perron vector of G .The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0. — G = αH + uvT > 0 — PageRank vector Some Happy Accidents — xT G = αxT H + β vT — λ2(G) = α Sparse computations with the original link structure Convergence rate controllable by Google engineers E = eeT /n.

πT = left-hand Perron vector of G — vT can be any positive probability vector in G = αH + uvT . — G = αH + uvT > 0 — PageRank vector Some Happy Accidents — xT G = αxT H + β vT — λ2(G) = α Sparse computations with the original link structure Convergence rate controllable by Google engineers E = eeT /n.The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0. 0<α<1 vT = eT /n u = αa + (1 − α)e.

— G = αH + uvT > 0 — PageRank vector Some Happy Accidents — xT G = αxT H + β vT — λ2(G) = α Sparse computations with the original link structure Convergence rate controllable by Google engineers E = eeT /n.The Google Fix Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0. 0<α<1 vT = eT /n u = αa + (1 − α)e. πT = left-hand Perron vector of G — vT can be any positive probability vector in G = αH + uvT — The choice of vT allows for personalization .

Search Issues Spamming • Link Farms .

.

.

Link Farms jefferson SEO client truman lincoln reagan washington clinton bush kennedy .

Search Issues Spamming • Link Farms • Google Bombs .

2003. Newsday newspaper says as few as 32 web pages with the words "miserable failure" link to the Bush biography. E-mail this to a friend Printable version LINKS TO MORE AMERICAS STORIES Select E-mail services | Desktop ticker | Mobiles/PDAs | Back to top ^^ News Front Page | Africa | Americas | Asia-Pacific | Europe | Middle East | South Asia UK | Business | Entertainment | Science/Nature | Technology | Health Have Your Say | Country Profiles | In Depth | Programmes BBCi Homepage >> | BBC Sport >> | BBC Weather >> | BBC World Service >> ABOUT BBC NEWS | Help | Feedback | News sources | Privacy | About the BBC http://news. The trick is possible because Google searches more than just the contents of web pages .BBC NEWS | Americas | 'Miserable failure' links to Bush 1/6/04 1:18 PM NEWS SPORT WEATHER WORLD SERVICE A-Z INDEX SEARCH Low Graphics version | Change edition Feedback | Help News Front Page Last Updated: Sunday.called "Google bombing" . internet users manipulated Google so the phrase "weapons of mass destruction" led to a joke page saying "These Weapons of Mass Destruction cannot be displayed.by linking their sites to a chosen one. when he used it to link the phrase "talentless hack" to a friend's website. members of an online community can affect the results of Google searches . The search engine can be manipulated by a fairly small group of users.uk/2/hi/americas/3298443." The site suggests "clicking the regime change button".co. Web users entering the words "miserable failure" into the popular search engine are directed to the biography of the president on the White House website.bbc. US army battles to keep soldiers Report backs US Catholic bishops Envoys bid to ease BSE fears Protests widen over sky marshals If you are George Bush and typed the country's name in the address bar. 15:04 GMT E-mail this to a friend Printable version 'Miserable failure' links to Bush Africa Americas Asia-Pacific Europe Middle East South Asia UK Business Health Science/Nature Technology Entertainment ----------------Have Your Say ----------------Country Profiles In Depth ----------------Programmes ----------------RELATED SITES George W Bush has been Google bombed. make sure that it is spelled correctly (IRAQ) Prank website LANGUAGES In the run-up to the Iraq war.stm Page 1 of 1 . one report suggested. SEE ALSO: WMD spoof is internet hit 04 Jul 03 | West Midlands Google hit by link bombers 13 Mar 02 | Science/Nature RELATED INTERNET LINKS: Bush has been the target of similar pranks before White House Google bombing The BBC is not responsible for the content of external internet sites TOP AMERICAS STORIES NOW Thus. or "If you are George Bush and typed the country's name in the address bar. and with what words. make sure that it is spelled correctly (IRAQ)". Weblogger Adam Mathes is credited with inventing the practice in 2001. 7 December. The Bush administration has been on the receiving end of pointed Google bombs before.it also counts how often a site is linked to.

. Bush and Miserable Failure in popular search engines." I'm taking part in a new web project. Bush and the phrase 'miserable failure' by copying this link and placing somewhere on your site or blog. I guess. I could add a few other keywords. someone came up with this great idea to link George W. but otherwise it does little to advance the theme. such as "Iraq policy".. frequently searched phrases. Why. They do stupid things. "george bush biography" gets a huge amounts of hits compared to something like "george bush policy"... compliments. Army of Fun Atrios Attorney At Arms Avedon's Sideshow Bag Times BartCop E! BartCop! Bellum Americanum Big Picnic Bitter Obscurity Booknotes Bunsen Burgblog Bush Is A Moron BushFlash BushLiar BusyBusyBusy Byrd's Brain Certain Shade of Green Chimes at Midnight Chris Nelson Circumspect CNN Lies Conniption Counterspin Cursor Daily Brew Daily Cynic Daily Kos Daily Outrage Daily War News Damfacrats Deckie Holmes Democratic Veteran Dodona dratfink Duckwing E Pluribus Unum Estimated Prophet Ethel Federal Examiner Fengi For Freedom Century Frog'N'Blog Ge. 'Den of Thieves' The ? Campaign. For instance "George Bush's Iraq Policy is a miserable failure...com has a keyword suggestion tool that shows how many times certain terms are coming up in searches. the phrases "george Bush miserable failure" were not queried even once in their sample during the month just passed. mendacious.html . JC Christian GeekPol Genoan Sailor GeoDog Get Donkey! That is genius.." The plan shouldn't be to link Miserable Failure to George Bush.com/graymatter/archives/00000654..) I'm in. that's very productive. Inc. runt-like miserable failure? Posted by tim @ 10/29/2003 11:58 AM NY Hahaha. Overture. Thank you very much for your participation. . Replies: 16 people speak up Great idea! Blahroll A Level Gaze A Skeptical Blog Ain't No Bad Dude Angry Bear Ann Slanders Apathy. help raise the link between George W. Posted by Political Pulpit @ 10/28/2003 02:32 PM NY Miserable Failure? I'm down with that. Using that tool. Bush as a Miserable Failure at least once a day.. If you have a blog or web site.10/27/2003: "I'm taking part in a new web project... Posted by Drewcifer @ 10/28/2003 02:35 PM NY Done! Posted by Maru @ 10/28/2003 08:46 PM NY thats great. y'all. but to link Miserable Failure to George Bush and two or three choice. Stay tuned."] 10/27/2003 Archived Entry: "I'm taking part in a new web project. instead of calling it lies--anyone can lie--how about calling it HORSEFEATHERS AND CODSWALLOP! Pin that on him too." 1/6/04 1:32 PM Dusty & Yellowing .welfare for the wealthy... you ask? Well.. According to Overture. like "pathetic"..blah3."] [Main Index] [>>"His heart just isn't in it. 2002 'Fair & Balanced' Day From this day forth. Posted by doodaa @ 10/29/2003 03:01 AM NY Call me a liberal lemming... arguments? Email me. It might be fun at a parties to show how often the two are in the same sentence in a Google search.. [<< "Happy Ramadan. Posted by BJ @ 10/29/2003 09:28 AM NY The key is stating it in connection with terms that will be widely searched. What will work is connecting it to frequent search times. I will post it on my blog now. So someone needs to write about three complete sentences using these terms based on verfiable search results and including the "miserable failure" phrase and then advocate for that exact usage.. Posted by Reek Stankleberry @ 10/29/2003 12:04 PM NY how about. Page 1 of 3 Posted by rlrr @ 10/27/2003 10:06 PM NY http://www. just my 2 cents. Posted by Joe Briefcase @ 10/29/2003 10:51 AM NY how about drunken.. I will refer to George W. It does no good to simply say "George Bush is a miserable failure" because no one will ever search for that. another thing I think might be good to use: tax cuts for the wealthy. This is why everyone knows that liberals are stupid..Archived Weblog Entry . illiterate.The Blah3 Archives Complaints. I can determine that in September the search for "bush george iraq saddam" gets about 12 times more queries than "george bush iraq speech".

Category: Society > History > .Google Search: miserable failure http://www.Cached . > Presidents > Carter.. 2004 I’ll Be Voting For Wesley Clark / Good-Bye Mr... I think retaining 1 of 2 1/22/04 5:14 PM .43k .000.Cached ....Similar pages BBC NEWS | Americas | 'Miserable failure' links to Bush 'Miserable failure' links to Bush.michaelmoore.Cached . Jimmy Carter.html .NY).com/unbound/polipro/pp2003-09-24. Jimmy Carter aspired to make Government "competent and compassionate . Hillary clinton. and the cuckolded dyke Project. Bush Home > President > Biography President George W..36k .. Quotes for the History Books. . Bush — by Michael Moore.blogspot. George Walker www.gov/history/presidents/jc39..Similar pages miserable failure | Hillary Clinton | Hildebeest . Description: Official site of the gadfly of corporations.. Results 1 . George W. Bush En Español.22k .. Miserable Failure.31k .senate. Atlantic Unbound | September 24.html .10 of about 257.Similar pages Senator Hillary Rodham Clinton: Online Office Welcome Page Dear Friend.60k . and the . Category: Society > History > .bbc...co. Description: Official US Senate web site of Senator Hillary Rodham Clinton (D .29k . Tip: In most browsers you can just hit the return key instead of clicking on the search button.. Many of you have written . 2003 Politics & Prose | by Jack Beatty "A Miserable Failure" Will Bush be re-elected? Only if voters . news.Cached .theatlantic.24 .08 seconds.Cached .whitehouse.. Category: Kids and Teens > School Time > . Thank you for visiting my on-line office! I appreciate your interest in the issues before the United States Senate... > Bush.Welcome .uk/2/hi/americas/3298443.. Prank website.Similar pages Atlantic Unbound | Politics & Prose | 2003.9k .. You may also want to check out the Miserable Failure Project....com/ . Michael www. creator of the film Roger and Me and the television show.Similar pages Dick Gephardt for President . .. .. miserable-failure. Advanced Search Preferences Language Tools Search Tips miserable failure Google Search Web Images Groups Directory News Searched the web for miserable failure. > First Ladies > Clinton.com Wednesday. to preserve some large part of the Bush tax cut.google.. Bush is the 43rd President of the United States.com/ . Search took 0. .... Description: Short biography from the official White House site.09.. He . James Earl www... Category: Arts > Celebrities > M > Moore.htm .gov/president/gwbbio..Cached .Cached .whitehouse. Newsday newspaper says as few as 32 web pages with the words "miserable failure" link to the Bush biography. www. Michael Moore. Description: Biography of the president from the official White House web site.Similar pages Biography of President George W.. January 14th.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=.Similar pages Biography of Jimmy Carter Home > History & Tours > Past Presidents > Jimmy Carter..stm .gov/ ..

W. Bush Bio webpage Kim's blog miserable failure .Google Bomb Bob's page miserable failure Jim's blog miserable failure G.

A9.Search Issues Spamming • Link Farms • Google Bombs Personalization • Google’s psearch. Kartoo .

Personalization is Coming .

KartOO visual meta search engine 01/23/2007 12:54 PM http://www.php3 Page 1 of 1 .com/flash04.kartoo.

Search Issues Spamming • Link Farms • Google Bombs Personalization • Google’s psearch. Kartoo Privacy • AOL Data Leak . A9.

A Face Is Exposed for AOL Searcher No. 4417749 - New York Times

01/23/2007 01:10 PM

HOME PAGE

MY TIMES

TODAY'S PAPER

VIDEO

MOST POPULAR

TIMES TOPICS

Free 14-Day Trial

Log In

Register Now

Technology
WORLD U.S. N.Y. / REGION CAMERAS BUSINESS CELLPHONES TECHNOLOGY COMPUTERS SCIENCE HANDHELDS HEALTH SPORTS OPINION ARTS CAMCORDERS HOME VIDEO MUSIC PERIPHERALS

Technology STYLE WI-FI TRAVEL

All NYT JOBS REAL ESTATE AUTOS

A Face Is Exposed for AOL Searcher No. 4417749
By MICHAEL BARBARO and TOM ZELLER Jr. Published: August 9, 2006

More Articles in Technology »

Circuits E-Mail
SIGN IN TO E-MAIL THIS PRINT SINGLE PAGE REPRINTS SAVE

Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it was not much of a shield. No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”

Sign up for David Pogue's exclusive column, sent every Thursday. See Sample | Privacy Policy

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.” It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her. AOL removed the search data from its site over the weekend and apologized for its release, saying it was an unauthorized move by a team that had hoped it would benefit academic researchers.

Erik S. Lesser for The New York Times

Thelma Arnold's identity was betrayed by AOL records of her Web searches, like ones for her dog, Dudley, who clearly has a problem.

Multimedia
Graphic: What Revealing Search Data Reveals

But the detailed records of searches conducted by Ms. Arnold and 657,000 other Americans, copies of which continue to circulate online, underscore how much people unintentionally reveal about themselves when they use search engines — and how risky it can be for companies like AOL, Google and Yahoo to compile such data. Those risks have long pitted privacy advocates against online marketers and other Internet companies seeking to profit from the Internet’s unique ability to track the comings and goings of users, allowing for more focused and therefore more lucrative advertising. But the unintended consequences of all that data being compiled, stored and crosslinked are what Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a privacy rights group in Washington, called “a ticking privacy time bomb.” Mr. Rotenberg pointed to Google’s own joust earlier this year with the Justice Department over a subpoena for some of its search data. The company successfully
http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000&en=f6f61949c6da4d38&ei=5090 Page 1 of 3

A Face Is Exposed for AOL Searcher No. 4417749 - New York Times

01/23/2007 01:10 PM

Department over a subpoena for some of its search data. The company successfully fended off the agency’s demand in court, but several other search companies, including AOL, complied. The Justice Department sought the information to help it defend a challenge to a law that is meant to shield children from sexually explicit material. “We supported Google at the time,” Mr. Rotenberg said, “but we also said that it was a mistake for Google to be saving so much information because it creates a risk.” Ms. Arnold, who agreed to discuss her searches with a reporter, said she was shocked to hear that AOL had saved and published three months’ worth of them. “My goodness, it’s my whole personal life,” she said. “I had no idea somebody was looking over my shoulder.” In the privacy of her four-bedroom home, Ms. Arnold searched for the answers to scores of life’s questions, big and small. How could she buy “school supplies for Iraq children”? What is the “safest place to live”? What is “the best season to visit Italy”? Her searches are a catalog of intentions, curiosity, anxieties and quotidian questions. There was the day in May, for example, when she typed in “termites,” then “tea for good health” then “mature living,” all within a few hours. Her queries mirror millions of those captured in AOL’s database, which reveal the concerns of expectant mothers, cancer patients, college students and music lovers. User No. 2178 searches for “foods to avoid when breast feeding.” No. 3482401 seeks guidance on “calorie counting.” No. 3483689 searches for the songs “Time After Time” and “Wind Beneath My Wings.” At times, the searches appear to betray intimate emotions and personal dilemmas. No. 3505202 asks about “depression and medical leave.” No. 7268042 types “fear that spouse contemplating cheating.” There are also many thousands of sexual queries, along with searches about “child porno” and “how to kill oneself by natural gas” that raise questions about what legal authorities can and should do with such information. But while these searches can tell the casual observer — or the sociologist or the marketer — much about the person who typed them, they can also prove highly misleading.

MOST POPULAR
E-MAILED BLOGGED SEARCHED

1 . Taking Middle Schoolers Out of the Middle 2. Ideas & Trends: Why Are There So Many Single Americans? 3 . In Raw World of Sex Movies, High Definition Could Be a View Too Real 4. The Consumer: An Old Cholesterol Remedy Is New Again 5 . Do You Believe in Magic? 6. Your Money: Don’t Call. Don’t Write. Let Me Be. 7 . Novelties: The Turntables That Transform Vinyl 8. Nicholas D. Kristof: Et Tu, George? 9. Refugees Find Hostility and Hope on Soccer Field 10 . Basics: Making Sense of Time, Earthbound and Otherwise
Go to Complete List »

nytimes.com/business

Can a star soccer player save an entire industry? Also in Business: How to covert your vinyl records into MP3 files Gambling subpoenas on Wall Street How did mutual funds fare in the fourth quarter?

Featured Product

At first glace, it might appear that Ms. Arnold fears she is suffering from a wide range of ailments. Her search history includes “hand tremors,” “nicotine effects on the body,” “dry mouth” and “bipolar.” But in an interview, Ms. Arnold said she routinely researched medical conditions for her friends to assuage their anxieties. Explaining her queries about nicotine, for example, she said: “I have a friend who needs to quit smoking and I want to help her do it.”
1 2
NEXT PAGE »

Make a difference. RED MOTORAZR™ V3m from Sprint. This Valentine’s, make a difference. The RED MOTORAZR™ V3m from Sprint. Benefits The Global Fund to help eliminate AIDS in Africa. valentines.sprint.com/coolphones

Saul Hansell contributed reporting for this article.
More Articles in Technology »

Need to know more? 50% off home delivery of The Times.

http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000&en=f6f61949c6da4d38&ei=5090

Page 2 of 3

A Face Is Exposed for AOL Searcher No. 4417749 - New York Times

01/23/2007 01:10 PM

Ads by Google HIPAA-Compliant Laptops
SafeBook Thin Client Notebook $799 No hard drive - protects your data
www.SafeBook.net

what's this?

Secret Satellite TV on PC
Shocking discovery they don't Want you to know.
www.secretsatellite.com

"Must See" Fios Webcasts
Is your records management program at risk? Policies discussed here.
www.FiosInc.com

Related Articles How to Digitally Hide (Somewhat) in Plain Sight (August 12, 2006) Your Life as an Open Book (August 12, 2006) AOL Removes Search Data On Vast Group Of Web Users (August 8, 2006) U.S. Wants Internet Companies to Keep Web-Surfing Records (June 2, 2006) Related Searches America Online Inc Privacy Computers and the Internet Google Inc

INSIDE NYTIMES.COM
N.Y. / REGION » U.S. » MOVIES » BOOKS »

Vinocur: A Price for Denying Holocaust?
Home World U.S.

In Douglass Tribute, Folklore and Fact Collide
N.Y. / Region Business

A Dose of Maturity for a California Protest
Science Health Sports Opinion Arts XML Style Help

Documenting an Obsession and a Marriage
Travel Jobs Real Estate Work for Us

Robert Loomis’s Career in Letters
Automobiles Site Map Back to Top

Technology

Copyright 2006 The New York Times Company

Privacy Policy

Search

Corrections

Contact Us

http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000&en=f6f61949c6da4d38&ei=5090

Page 3 of 3

A9.ch .Search Issues Spamming • Link Farms • Google Bombs Personalization • Google’s psearch. Kartoo Privacy • AOL Data Leak Data Fusion • Search.

ch] 01/23/2007 02:02 PM [ map.search.search.html?x=422&y=-105&z=1024&poi=all&p=610x642 Page 1 of 1 .search.ch/bern.Map: Bern [map.en.ch ] Map: Bern 50 m http://map.

) are combined with content scores for final rankings • Link analysis has dramatically improved search. . • Many continuing CSC and MATH challenges. etc. HITS.Conclusion • Link-based scores (PageRank.

Web Graphs CSC and MATH challenges (problems of scale!) – store adjacency matrix – update adjacency matrix – visualize web graph – locate clusters in graph .

• Many continuing CSC and MATH challenges. • The constant battle between search engines and SEOs means that companies and algorithms must adapt and innovate. HITS. .) are combined with content scores for final rankings • Link analysis has dramatically improved search.Conclusion • Link-based scores (PageRank. etc.

Elegant and Exciting Application of Linear Algebra That is Changing the World . etc. HITS. • The constant battle between search engines and SEOs means that companies and algorithms must adapt and innovate. • Many continuing CSC and MATH challenges.) are combined with content scores for final rankings • Link analysis has dramatically improved search.Conclusion • Link-based scores (PageRank.