0 views

Uploaded by grzerysz

How Google Ranks Web Pages

save

- Multivariate Statistics - An Introduction 8th Edition
- Intro 3
- Markov Sample2 (1)
- SAE Toward Efficient Cloud Data Analysis
- Campbell Analysis
- quadratic forms
- 022401.pdf
- Chapter 24
- p7 Sig Pertambangan Ahp
- IJAIEM-2014-11-17-52
- A New Architecture Using Polynomial Matrix Multiplication for Advanced Wireless Communication
- Beam Natural Freq
- brain4
- TopicsAndProofs-23a-2014
- r059210101 Mathematics II
- Nth Power of a Square Matrix
- 1310.5806v1.pdf
- EMW Techniques3_1D
- Pruebadd de Entrada MT229_Resumen
- Saad Y., Iterative Methods for Sparse Linear Systems
- GpReps-I
- Ece502 Mathematics-For-communications Th 1.00 Ac29
- Klt
- Eigen Calculation
- Leonardis Robusteigen Cviu00
- Ahmed_Rebai_PCA-ICA.ppt
- Monee a 199812
- eigen
- Kd 2517851787
- Structural Stability 1
- IDA3 Errata
- Murphy Intro Gm
- bayes-tut
- Review of Sag Mal
- Science Machine Learning-Trends, Perspectives, And Prospects
- Kernel Methods in Machine Learning
- Unsupervised Learning
- stan
- Hrishikesh D. Vinod 7814 Errata
- Stan Reference 2.9.0
- DanceofDeath-Guide6
- HMM
- Math 51 Syllabus
- fose-icse2014
- Bayes Tutorial
- Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm- SVM+
- 6.438 Algorithms for Inference
- 6.437 Inference and Information
- Guiding Text22
- Gretton_Introduction to RKHS, And Some Simple Kernel Algorithms
- Daume_From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less
- GRE Vocabulary Print
- 4492OS GraphicBundle Building a Recommendation System With R
- Pearl_ the Art and Science of Cause and Effect
- Learning With Kernels
- Math for Machine Learning
- Paulsen_An Introduction to the Theory of Reproducing Kernel Hilbert Spaces
- Theory of Linear Operators in Hilbert Space
- 6.438 Algorithms for Inference

You are on page 1of 4

:

HOW GOOGLE RANKS WEB PAGES

BRIAN WHITE

**During a typical use of a search engine,
**

(1) the user types a word,

(2) the engine finds all web pages that contain the given word, and

(3) the engine lists the pages in some order, ideally from most interesting to

least interesting.

When Google was launched, there were already a number of search engines. Google’s

main innovation was a superior way of doing step 3, that is, a better way of ranking

web pages. Google’s method is1 called the PageRank algorithm and was developed

by Google founders Sergey Brin and Larry Page while they were graduate students

at Stanford. At that time, Brin and Page were 23 and 24 years old, respectively.

In the PageRank method, the ranking of pages is based solely on how pages are

linked and not on the content of the pages or on how often the pages are visited.

Of course the web is constantly changing, so the rankings change, too. Google

calculates the ranks once a month. Each time you do a search, Google gives a list

of the relevant pages in order of that month’s ranking.

The rankings form the entries of a vector v ∈ RN , where N is the total number

of pages on the web. The entries of v are nonnegative numbers, and vi is supposed

to be a measure of the value (or interest or importance) of page i. Thus in a given

search, the first page Google will list is the page i (among those containing the

search word) with the highest value of vi .

In this note, we describe a very simple ranking vector v, followed by several

improvements leading to the PageRank method.

1. The Very Simple Method

If page j has a link to page i, imagine that page j is recommending page i. One

could rank pages purely by the number of recommendations each page gets. Let N

be the total number of web pages (over 4 billion, according to Google.) Let A be

the N by N matrix whose ij entry is

(

1 if page j has a link to page i

aij =

0 if not.

Then the number of recommendations that page i gets is

ai1 + ai2 + · · · + ain .

Date: November, 2004. Revised June, 2013.

1

This article describes the original basis of Google’s ranking of webpages. Google makes many

modifications to its algorithm every year, in part to keep ahead of people who try to make their

pages appear artificially high on the list.

1

The formula (2) can be written more succinctly as v = P u. Thus we replace the formula (1) by: ai1 ai2 aiN (2) vi = + + . including itself. n1 n2 nN or in matrix notation: v = P v. it must be because P is a Markov matrix. 1 had better be an eigenvalue of P .. Since vj is supposed to reflect the value or importance of page j. .e. Fortunately. Thus we are led to conclude that the ranking vector v should be an eigenvector of P with eigenvalue 1. because then the formula (2) involves dividing by 0.. A Further Improvement The weight of a recommendation should depend on the importance of the recommender. the entries of P are all ≥ 0 and the sum of the entries in each column is 1. i. That is.. but not to page i. n1 n2 nN where nj is the total number of recommendations that page j makes: X nj = aij . This equation can be written very concisely using matrix multiplication: v = Au where u ∈ RN is the vector each of whose components is 1.2 BRIAN WHITE We can think of this as a measure of the “value” vi of page i (1) vi = ai1 + ai2 + · · · + ain . . where P is the N by N matrix whose ij entry is aij pij = nj 3. Google gets around this problem by pretending that a page with no links out in fact has a link to every page. so formula (2) makes sense. P Now nj = i aij will always be positive. Thus we redefine the matrix A as follows: ( 1 if page j has a link to page i. if there are no links from page j. Note for this to give a ranking of the webpages. This suggests that the weight of a recommendation made by page j should be inversely proportional to the total number of recommendations that j makes. First improvement To be included on a short list of recommended restaurants (for example) presumably means more than to be included on a long list. we thus multiply the j th term in (2) by vj to get the formula ai1 ai2 aiN vi = v1 + v2 + · · · + vN . 2. or if page j is a dead end aij = 0 if there is a link from page j to some page. i There is a problem if page j is a “dead-end”.

but that there are no links from pages 1 or 2 to pages 3 or 4 or vice versa. 4.) Imagine you have a biased coin. A Final Improvement One last improvement will give us the PageRank method. which means that the equation (P − I)v = 0 must have at least one free variable. we get a matrix whose last row is 0. Fix some parameter r with 0 < r < 1. if it is possible by following links to get from page j to page i but not vice versa. but it gives no information about the relative sizes of v1 and v3 . . (Of course if r = 1/2. If we imagine a large population of web surfers moving around the web in this way. Of course such a v is an eigenvector of P with eigenvalue 1. Suppose that pages 1 and 2 link to each other.) If you’re at a dead end (no links). the sum of the elements in each column of the matrix I − P is 0. But if there are several clusters of webpages not connected to each other and having no dead ends2. Google avoids these problems by replacing P by another Markov matrix Q. In that case. Suppose P is a Markov matrix. (For example. so that the probability of heads is r. Since the sum of the elements in each column of P is 1. suppose that there are only four pages. Furthermore. Another problem with v = P v is that many pages are likely to be tied with a score of 0. then choose randomly one of the links from page j and follow that link. If you get heads. Indeed. This means that the rank of I − P is at most N − 1. For example. if there are six links. or if the coin comes up tails. the matrix P does have an eigenvector with eigenvalue 1. then vj turns out to be 0.PAGERANK 3 Theorem 1. flip the coin. According to theorem 1. If you’re on page j. then P will have a set of two or more linearly independent eigenvectors with eigenvalue 1. Thus if we add the first N − 1 rows of I − P to the last row. (Google uses r = . Let us first describe the random web walk given by Q. The equation v = P v implies that v1 = v2 and that v3 = v4 . there is an eigenvector v with eigenvalue 1 and with each entry ≥ 0.) Now move around the web as follows. When you’re on a given page. the equation v = P v will not determine a unique ranking. which means that the equation has a nonzero solution v. then pick a page at random from the whole web and jump (or “teleport”) to that page. you roll a die to choose which link to follow. A solution v to the equation v = P v can be regarded as describing an equilibrium or steady state distribution of surfers on the various pages. The proof that there is an eigenvector with all entries ≥ 0 is more complicated and we omit it. 2Recall that a dead end is treated as though it linked to every web page. you choose a page at random from the entire web and move to it. The Markov matrix P may be regarded as describing a random walk over the web in which you move from page to page as follows. then it is a fair coin. and that pages 3 and 4 link to each other. Then λ = 1 is an eigenvalue of P . Proof. then pij gives the expected fraction of those surfers on page j who will then move to page i.85. you choose at random one of the links from that page and then you follow that link. If page j is a dead-end.

stanford. But there are over four billion web pages. Golub.edu:8090/pub/1999-66. How to Find v In general. and so on.maths. Even just multiplying a huge vector by a huge matrix could take a while. The PageRank citation ranking: bringing order to the web. [4] T. 1998. Kamkar. (2) The sum of the entries is 1: i vi = 1. Technical Report.stanford. N Note also that the multiplication P w can be done rather quickly because the vast majority of the entries of P are 0.4 BRIAN WHITE In mathematical notation. we let Q = rP + (1 − r)T where T is the N by N “teleportation” matrix. The second eigenvalue of the google matrix. It is convenient to divide this vector by the sum of its entries to get a vector w. Since w has N entries that add up to 1. Of course the sum of the entries of w will then be 1. and G. Stanford Digital Libraries Project. Brin. Then there is a unique vector v such that (1) The entries of v are all positive: P vi > 0 for all i. http://www. PAGERANK 5 Thus calculating the eigenvector v takes Google days rather than eons. Haveliwala.stanford. Linear Algebra and its Applications. Note that Q is still a Markov matrix. pp. T. theorem 2 gives another method. normalize it so that the sum of the entries is 1. We begin with any nonzero vector whose entries are all nonnegative.e. Let Q be a Markov matrix none of whose entries is 0. Motwani. We find that eigenvector.strath. so solving v = Qv (or (Q − I)v = 0) by Gaussian elimination takes too long. i. Note that Qw = rP w + (1 − r)T w. Indeed. if w is any vector satisfying condition 2 ( converges to v as k → ∞. Furthermore. Q2 w. these vectors converge to v. up to scaling. Thus T w = u/N . Note that each entry of T w is just the average of the entries of w. Then we use v to rank the webpages: the page i with the largest vi is ranked first. R. we can find Qw much more easily than it might first appear. Q has a unique eigenvector v with eigenvalue 1. i. for a computer doing a trillion multiplications per second it would take about a billion years to solve this system by Gaussian elimination. Q3 w. 2004. March 2003.pdf. However. So we keep going until the sequence settles down.ac. But for the particular matrix Q. then Qk w Thus. S. until Qk+1 w is nearly equal to Qk w. and so on. [3] D. 5. Kamvar.uk/˜ aas96106/rep20 2003. Stanford University Technical Report. Fortunately. Taylor. the average is 1/N . The sleekest link algorithm.e. Haveliwala and S. http://dbpubs. so Qw = rP w + 1−r u. Then Qk w is (for all practical purposes) our eigenvector v. the matrix each of whose entries is 1/N . http://dbpubs. By theorem 2.edu/˜taherh/papers [2] L. linear systems are solved by Gaussian elimination.edu:8090/pub/2003-20 . We repeatedly multiply by Q to get Qw. unlike P . Higham and A. P i wi = 1). Page. the matrix Q has no zeros. Winograd. Thus we can apply the following theorem to Q: Theorem 2. and T. http://www. 51–65. (3) v = Qv. Adaptive methods for the computation of PageRank. References [1] S.

- Multivariate Statistics - An Introduction 8th EditionUploaded bysharingandnotcaring
- Intro 3Uploaded byapi-26021617
- Markov Sample2 (1)Uploaded bySeong Yoon Kim
- SAE Toward Efficient Cloud Data AnalysisUploaded byReddy Sumanth
- Campbell AnalysisUploaded bymishraengg
- quadratic formsUploaded bypetere056
- 022401.pdfUploaded byyagoegay
- Chapter 24Uploaded byJavier Silva Sandoval
- p7 Sig Pertambangan AhpUploaded byAzelia Bonita
- IJAIEM-2014-11-17-52Uploaded byAnonymous vQrJlEN
- A New Architecture Using Polynomial Matrix Multiplication for Advanced Wireless CommunicationUploaded byIRJET Journal
- Beam Natural FreqUploaded bytrimohit
- brain4Uploaded byNishant Jadhav
- TopicsAndProofs-23a-2014Uploaded byjoekegan
- r059210101 Mathematics IIUploaded bySrinivasa Rao G
- Nth Power of a Square MatrixUploaded byfarah
- 1310.5806v1.pdfUploaded byeduval01
- EMW Techniques3_1DUploaded byjustdream
- Pruebadd de Entrada MT229_ResumenUploaded byAnonymous 5VU08B
- Saad Y., Iterative Methods for Sparse Linear SystemsUploaded byTopografia Mecanica
- GpReps-IUploaded by2eaa889c
- Ece502 Mathematics-For-communications Th 1.00 Ac29Uploaded byRaj Kumar
- KltUploaded byremaravindra
- Eigen CalculationUploaded byAiz Dan
- Leonardis Robusteigen Cviu00Uploaded bySandeep Devasrii
- Ahmed_Rebai_PCA-ICA.pptUploaded byOtman Chakkor
- Monee a 199812Uploaded bynaoto_soma
- eigenUploaded byamit
- Kd 2517851787Uploaded byAnonymous 7VPPkWS8O
- Structural Stability 1Uploaded byRicox4444

- IDA3 ErrataUploaded bygrzerysz
- Murphy Intro GmUploaded bygrzerysz
- bayes-tutUploaded bygrzerysz
- Review of Sag MalUploaded bygrzerysz
- Science Machine Learning-Trends, Perspectives, And ProspectsUploaded bygrzerysz
- Kernel Methods in Machine LearningUploaded bygrzerysz
- Unsupervised LearningUploaded bygrzerysz
- stanUploaded bygrzerysz
- Hrishikesh D. Vinod 7814 ErrataUploaded bygrzerysz
- Stan Reference 2.9.0Uploaded bygrzerysz
- DanceofDeath-Guide6Uploaded bygrzerysz
- HMMUploaded bygrzerysz
- Math 51 SyllabusUploaded bygrzerysz
- fose-icse2014Uploaded bygrzerysz
- Bayes TutorialUploaded bygrzerysz
- Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm- SVM+Uploaded bygrzerysz
- 6.438 Algorithms for InferenceUploaded bygrzerysz
- 6.437 Inference and InformationUploaded bygrzerysz
- Guiding Text22Uploaded bygrzerysz
- Gretton_Introduction to RKHS, And Some Simple Kernel AlgorithmsUploaded bygrzerysz
- Daume_From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or LessUploaded bygrzerysz
- GRE Vocabulary PrintUploaded bygrzerysz
- 4492OS GraphicBundle Building a Recommendation System With RUploaded bygrzerysz
- Pearl_ the Art and Science of Cause and EffectUploaded bygrzerysz
- Learning With KernelsUploaded bygrzerysz
- Math for Machine LearningUploaded bygrzerysz
- Paulsen_An Introduction to the Theory of Reproducing Kernel Hilbert SpacesUploaded bygrzerysz
- Theory of Linear Operators in Hilbert SpaceUploaded bygrzerysz
- 6.438 Algorithms for InferenceUploaded bygrzerysz