and the Choice of the Outlinks
Laure Ninove
Joint work with Cristobald de Kerchove and Paul Van Dooren
CESAME
Université catholique de Louvain, Belgium
CESAME Seminar
February 27, 2007
Laure Ninove (CESAME) Outlinks and PR 1 / 27
guides websurfers in their
visits.
A good ranking is vital for a
How to improve your
Laure Ninove (CESAME) Outlinks and PR 2 / 27
guides websurfers in their
visits.
A good ranking is vital for a
How to improve your
Laure Ninove (CESAME) Outlinks and PR 2 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
PageRank equations
2
3
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 3 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
PageRank equations
2
3
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 4 / 27
A brief history of the Web search engine Google
1996: a research project, by L. Page and S. Brin
1998: Google Inc. company, 25 million webpages indexed
2005: 8 billion webpages indexed
“The primary goal is to provide high quality search results
over a rapidly growing World Wide Web. Google employs a
number of techniques to improve search quality including
page rank, anchor text, and proximity information.”
Brin & Page, 1998
The anatomy of a large-scale hypertextual web search engine
Laure Ninove (CESAME) Outlinks and PR 5 / 27
An hyperlink from i to j

i ’s vote of conﬁdence in j .
A page j has a high PageRank π
j
if it is pointed to by many pages with
a high PageRank,
Laure Ninove (CESAME) Outlinks and PR 6 / 27
Example
1
2
4
3
2/11
2/11
?
1/11
1/11
2/11
π
1
=
1
2
π
2
+ 1 π
4
=
3
11
Laure Ninove (CESAME) Outlinks and PR 7 / 27
PageRank equations
Vote of conﬁdence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Vote of conﬁdence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Vote of conﬁdence
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j

j
π
j
= 1
sum of parents’ weighted scores
normalization of the PageRanks
damping with personalization score
π
T
= c π
T
D
−1
A + (1 −c)z
T
π
T
e = 1
A ∈ ¦0, 1¦
n
(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ∈ ]0, 1[: damping factor
z > 0, z
T
e = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
PageRank equations
Random walk
G = c D
−1
A + (1 −c) ez
T
Irreducible, stochastic matrix −→transition probability matrix
Random walk on the webgraph:
P(i →j ) = G
ij
P(zap according z) = 1 −c
PageRank vector π: stationary distribution of this Markov chain
π
T
G = π
T
π
T
e = 1
Laure Ninove (CESAME) Outlinks and PR 9 / 27
Damping with a personalization score
Example
1
2
4
3
0.19 ?
c*0.095
c*0.19
z
(1−c)*0.25
0.19
π
1
= c

1
2
π
2
+ π
4

+ (1 −c) z
1
Laure Ninove (CESAME) Outlinks and PR 10 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
PageRank equations
2
3
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 11 / 27
Laure Ninove (CESAME) Outlinks and PR 12 / 27
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
Example
1 1
π
1
= 0.196 < π
1
= 0.245
Laure Ninove (CESAME) Outlinks and PR 14 / 27
π
j
= c

i →j
π
i
d
i
+ (1 −c)z
j
But no control
Ipsen & Wills, 2006
Mathematical properties and analysis of Google’s PageRank
Laure Ninove (CESAME) Outlinks and PR 15 / 27
You control them
Constraints:
no loop
Impact not obvious:
Sydow, 2005
Laure Ninove (CESAME) Outlinks and PR 16 / 27
You control them
Constraints:
no loop
Impact not obvious:
Sydow, 2005
Laure Ninove (CESAME) Outlinks and PR 16 / 27
Example
1 1 1
π
1
= 0.182 < π
1
= 0.196 < π
1
= 0.211
Laure Ninove (CESAME) Outlinks and PR 17 / 27
Outline
1
Preliminaries: What is under Google’s PageRank?
A brief history
PageRank equations
2
3
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 18 / 27
Notation
Let 1 be the considered set of nodes.
Up to a permutation of the indices,
A =

A
I
A
out(I)
A
in(I)
A
¯
I

.
Laure Ninove (CESAME) Outlinks and PR 19 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proof.
π
1
(e
T
L
) =
1
c

i ∈L
e
T
i
(I −G
¯
I
)
−1
e
[L[
+ constant
.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proposition
Suppose that 1 has some parents. Then
π
1
(e
T
L
) is maximal =⇒ L ⊆ ¦parents of 1¦.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Suppose 1 = ¦1¦.
We want to maximize π
1
(A
out({1})
).
With A
out({1})
= e
T
L
, where L = ¦children of 1¦ = ∅.
Proposition
π
1
(e
T
L
) is maximal ⇐⇒ ∅ = L ⊆ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Proposition
Suppose that 1 has some parents. Then
π
1
(e
T
L
) is maximal =⇒ L ⊆ ¦parents of 1¦.
But
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
Example
Example
1
2
3
*
*
*
In order to maximize its PageRank,
to some node(s) (parents).
But it is better for 1 to link
to node 3 (grand-parent)
rather than to node 2 (parent).
Laure Ninove (CESAME) Outlinks and PR 21 / 27
Optimal outlink structure for a set of nodes
Consider now a set 1 of nodes.
I
given, with A
I
has no zero row.
out(I)
to be determined.
Goal: to maximize the sum of PageRanks:
max
A
out(I)

i ∈I
π
i
(A
out(I)
).
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of nodes
Proposition
Under the assumption that 1 has at least m external outlinks,

i ∈I
π
i
(A
out(I)
) is maximal =⇒ 1 has exactly m external outlinks.
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of node
Proof.
Removing a link i →j from the graph ⇐⇒perturbation:
˜
G
(i ,j )
= c (D
−1
A + e
i
δ
(i ,j )T
) + (1 −c) ez
T
.
Difference between new and old PageRank sums:

s∈I
π
(i ,j )
s

s∈I
π
s
= c π
i
δ
(i ,j )T
(I −cD
−1
A)
−1
e
I
1 −c δ
(i ,j )T
(I −cD
−1
A)
−1
e
i
.
For every link i →j , c δ
(i ,j )T
(I −c D
−1
A)
−1
e
i
< 1.
There exists an external outlink k → with k ∈ 1, / ∈ 1, such that
δ
(k,)T
(I −cD
−1
A)
−1
e
I
> 0.
Laure Ninove (CESAME) Outlinks and PR 23 / 27
Optimal outlink structure for a set of node
Example
Sometimes, removing an outlink for 1
may decrease the PageRank sum for 1.
2
3
4
5
1 2
3
4
5
1 2
3
4
5
1

i ∈I
π
i
(1 →3) <

i ∈I
π
i
(1 →3&5) <

i ∈I
π
i
(1 →5)
Laure Ninove (CESAME) Outlinks and PR 24 / 27
Optimal outlink structure for a set of nodes
Special case
Proposition
Let 1 be a set of nodes organized in a clique.
Let T ⊂ 1 be the set of nodes f
without any external parent (A
in(I)
e
f
= 0),
with a minimal zapping for 1 (z
f
= min
i ∈I
z
i
).
Suppose that 1 must have at least one external outlink. If T = ∅, then

i ∈I
π
i
(A
out(I)
) is maximal
⇐⇒
A
out(I)
= e
f
e
T

with f ∈ T and ∈ L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Laure Ninove (CESAME) Outlinks and PR 25 / 27
Summary
A single webpage 1:
π
1
(A
out(I)
) maximal ⇐⇒ A
out(I)
= e
T
L
with ∅ = L ⊆ L

,
where L

= arg min
i
e
T
i
(I −G
¯
I
)
−1
e.
Moreover L

⊆ ¦parents of 1¦.
A set 1 of at least 2 webpages:

i ∈I
π
i
(A
out(I)
) maximal =⇒ 1 has a unique external outlink,
if we suppose that 1 must have at least one external outlink.
Under some assumptions: external outlink k → with ∈ L

.
Laure Ninove (CESAME) Outlinks and PR 26 / 27
Related questions
I
?

Impact not obvious

can decrease the PageRank of one of these pages,
or even, can decrease the sum of their PageRanks!

The clique is not always the optimal internal link structure.
Laure Ninove (CESAME) Outlinks and PR 27 / 27
Related questions