Google’s PageRank

and the Choice of the Outlinks
Laure Ninove
Joint work with Cristobald de Kerchove and Paul Van Dooren
CESAME
Université catholique de Louvain, Belgium
CESAME Seminar
February 27, 2007
Laure Ninove (CESAME) Outlinks and PR 1 / 27
Google’s power
Google’s search engine
guides websurfers in their
visits.
A good ranking is vital for a
webpage to be read.
How to improve your
Google rank?
Laure Ninove (CESAME) Outlinks and PR 2 / 27
Google’s power
Google’s search engine
guides websurfers in their
visits.
A good ranking is vital for a
webpage to be read.
How to improve your
Google rank?
Laure Ninove (CESAME) Outlinks and PR 2 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
A story of links
PageRank equations
2
How to improve your PageRank?
Add inlinks
Choose outlinks
3
Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 3 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
A story of links
PageRank equations
2
How to improve your PageRank?
Add inlinks
Choose outlinks
3
Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 4 / 27
A brief history of the Web search engine Google
1996: a research project, by L. Page and S. Brin
1998: Google Inc. company, 25 million webpages indexed
2005: 8 billion webpages indexed
2006: "to google" added to the Oxford English Dictionary
“The primary goal is to provide high quality search results
over a rapidly growing World Wide Web. Google employs a
number of techniques to improve search quality including
page rank, anchor text, and proximity information.”
Brin & Page, 1998
The anatomy of a large-scale hypertextual web search engine
Laure Ninove (CESAME) Outlinks and PR 5 / 27
Google’s PageRank: a story of links
An hyperlink from i to j

i ’s vote of confidence in j .
A page j has a high PageRank π
j
if it is pointed to by many pages with
a high PageRank,
few outlinks.
Laure Ninove (CESAME) Outlinks and PR 6 / 27
Votes of confidence
Example
1
2
4
3
2/11
2/11
?
1/11
1/11
2/11
π
1
=
1
2
π
2
+ 1 π
4
=
3
11
Laure Ninove (CESAME) Outlinks and PR 7 / 27
PageRank equations
Vote of confidence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
: webgraph’s adjacency matrix
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Vote of confidence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
: webgraph’s adjacency matrix
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Vote of confidence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
: webgraph’s adjacency matrix
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Random walk
Google matrix:
G = c D
−1
A + (1 −c) ez
T
Irreducible, stochastic matrix −→transition probability matrix
Random walk on the webgraph:
P(i →j ) = G
ij
, with P(follow hyperlinks) = c
P(zap according z) = 1 −c
PageRank vector π: stationary distribution of this Markov chain
π
T
G = π
T
π
T
e = 1
Laure Ninove (CESAME) Outlinks and PR 9 / 27
Damping with a personalization score
Example
1
2
4
3
0.19 ?
c*0.095
c*0.19
z
(1−c)*0.25
0.19
π
1
= c

1
2
π
2
+ π
4

+ (1 −c) z
1
Laure Ninove (CESAME) Outlinks and PR 10 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
A story of links
PageRank equations
2
How to improve your PageRank?
Add inlinks
Choose outlinks
3
Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 11 / 27
How to improve your PageRank?
Laure Ninove (CESAME) Outlinks and PR 12 / 27
How to improve your PageRank?
Add inlinks
Add inlinks?
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
Always your PR
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
How to improve your PageRank?
Add inlinks
Add inlinks?
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
Always your PR
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
How to improve your PageRank?
Add inlinks
Example
1 1
π
1
= 0.196 < π
(inlink)
1
= 0.245
Laure Ninove (CESAME) Outlinks and PR 14 / 27
How to improve your PageRank?
Add inlinks
Add inlinks?
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
Always your PR
But no control
on your inlinks
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 15 / 27
How to improve your PageRank?
Choose outlinks
Choose outlinks?
You control them
Constraints:
at least one outlink
no loop
Impact not obvious:
adding outlinks can
or `your PR
Sydow, 2005
Can one out-link change your PageRank?
Laure Ninove (CESAME) Outlinks and PR 16 / 27
How to improve your PageRank?
Choose outlinks
Choose outlinks?
You control them
Constraints:
at least one outlink
no loop
Impact not obvious:
adding outlinks can
or `your PR
Sydow, 2005
Can one out-link change your PageRank?
Laure Ninove (CESAME) Outlinks and PR 16 / 27
How to improve your PageRank?
Choose outlinks
Example
1 1 1
π
(outlink a)
1
= 0.182 < π
1
= 0.196 < π
(outlink b)
1
= 0.211
Laure Ninove (CESAME) Outlinks and PR 17 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
A story of links
PageRank equations
2
How to improve your PageRank?
Add inlinks
Choose outlinks
3
Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 18 / 27
Notation
Let 1 be the considered set of nodes.
Up to a permutation of the indices,
A =

A
I
A
out(I)
A
in(I)
A
¯
I

.
Laure Ninove (CESAME) Outlinks and PR 19 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proof.
π
1
(e
T
L
) =
1
c

i ∈L
e
T
i
(I −G
¯
I
)
−1
e
[L[
+ constant
.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proposition
Suppose that 1 has some parents. Then
π
1
(e
T
L
) is maximal =⇒ L ⊆ ¦parents of 1¦.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proposition
Suppose that 1 has some parents. Then
π
1
(e
T
L
) is maximal =⇒ L ⊆ ¦parents of 1¦.
But
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Example
Example
1
2
3
*
*
*
In order to maximize its PageRank,
Node 1 should link
to some node(s) (parents).
But it is better for 1 to link
to node 3 (grand-parent)
rather than to node 2 (parent).
Laure Ninove (CESAME) Outlinks and PR 21 / 27
Optimal outlink structure for a set of nodes
Consider now a set 1 of nodes.
Internal link structure A
I
given, with A
I
has no zero row.
External outlink structure A
out(I)
to be determined.
Goal: to maximize the sum of PageRanks:
max
A
out(I)

i ∈I
π
i
(A
out(I)
).
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of nodes
Proposition
Under the assumption that 1 has at least m external outlinks,

i ∈I
π
i
(A
out(I)
) is maximal =⇒ 1 has exactly m external outlinks.
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of node
Proof.
Removing a link i →j from the graph ⇐⇒perturbation:
˜
G
(i ,j )
= c (D
−1
A + e
i
δ
(i ,j )T
) + (1 −c) ez
T
.
Difference between new and old PageRank sums:

s∈I
π
(i ,j )
s

s∈I
π
s
= c π
i
δ
(i ,j )T
(I −cD
−1
A)
−1
e
I
1 −c δ
(i ,j )T
(I −cD
−1
A)
−1
e
i
.
For every link i →j , c δ
(i ,j )T
(I −c D
−1
A)
−1
e
i
< 1.
There exists an external outlink k → with k ∈ 1, / ∈ 1, such that
δ
(k,)T
(I −cD
−1
A)
−1
e
I
> 0.
Laure Ninove (CESAME) Outlinks and PR 23 / 27
Optimal outlink structure for a set of node
Example
Sometimes, removing an outlink for 1
may decrease the PageRank sum for 1.
2
3
4
5
1 2
3
4
5
1 2
3
4
5
1

i ∈I
π
i
(1 →3) <

i ∈I
π
i
(1 →3&5) <

i ∈I
π
i
(1 →5)
Laure Ninove (CESAME) Outlinks and PR 24 / 27
Optimal outlink structure for a set of nodes
Special case
Proposition
Let 1 be a set of nodes organized in a clique.
Let T ⊂ 1 be the set of nodes f
without any external parent (A
in(I)
e
f
= 0),
with a minimal zapping for 1 (z
f
= min
i ∈I
z
i
).
Suppose that 1 must have at least one external outlink. If T = ∅, then

i ∈I
π
i
(A
out(I)
) is maximal
⇐⇒
A
out(I)
= e
f
e
T

with f ∈ T and ∈ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Laure Ninove (CESAME) Outlinks and PR 25 / 27
Summary
A single webpage 1:
π
1
(A
out(I)
) maximal ⇐⇒ A
out(I)
= e
T
L
with ∅ = L ⊆ L

,
where L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Moreover L

⊆ ¦parents of 1¦.
A set 1 of at least 2 webpages:

i ∈I
π
i
(A
out(I)
) maximal =⇒ 1 has a unique external outlink,
if we suppose that 1 must have at least one external outlink.
Under some assumptions: external outlink k → with ∈ L

.
Laure Ninove (CESAME) Outlinks and PR 26 / 27
Related questions
Modify internal link structure A
I
?

Impact not obvious

Adding a link between two pages of 1
can decrease the PageRank of one of these pages,
or even, can decrease the sum of their PageRanks!

The clique is not always the optimal internal link structure.
Add servant pages: link spam farms
Laure Ninove (CESAME) Outlinks and PR 27 / 27
Related questions
Modify internal link structure A
I
?

Impact not obvious

Adding a link between two pages of 1
can decrease the PageRank of one of these pages,
or even, can decrease the sum of their PageRanks!

The clique is not always the optimal internal link structure.
Add servant pages: link spam farms
Link farm Target
Laure Ninove (CESAME) Outlinks and PR 27 / 27